´ Ecole des Hautes ´ Etudes en Sciences Sociales ´ Ecole doctorale: ED 465 – ´ Economie Panth´ eon Sorbonne Paris School of Economics Doctorat Discipline: Sciences ´ Economiques Juliette C RESPIN -B OUCAUD Essays on ethnicity and marriage in Africa Th` ese dirig´ ee par Denis COGNEAU Date de soutenance : le 13 septembre 2021 Rapporteurs 1 Catherine GUIRKINGER Professeure ` a l’Universit´ e de Namur 2 Michael GRIMM Professeur ` a l’Universit´ e de Passau Jury 1 Denis COGNEAU Directeur d’´ etudes EHESS Directeur de recherche IRD Professeur ` a PSE 2 Val´ erie GOLAZ Directrice de Recherche, INED 3 Catherine GUIRKINGER Professeure ` a l’Universit´ e de Namur 4 Michael GRIMM Professeur ` a l’Universit´ e de Passau 5 Oliver VANDEN EYNDE Charg´ e de Recherche, CNRS Professeur ` a PSE
185
Embed
Essays on ethnicity and marriage in Sub-Saharan Africa
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Ecole des Hautes Etudes en Sciences Sociales
Ecole doctorale: ED 465 – Economie Pantheon Sorbonne
Paris School of Economics
Doctorat
Discipline: Sciences Economiques
Juliette CRESPIN-BOUCAUD
Essays on ethnicity and marriage in Africa
These dirigee par Denis COGNEAU
Date de soutenance : le 13 septembre 2021
Rapporteurs 1 Catherine GUIRKINGER Professeure a l’Universite de Namur
2 Michael GRIMM Professeur a l’Universite de Passau
Jury 1 Denis COGNEAU Directeur d’etudes EHESSDirecteur de recherche IRDProfesseur a PSE
2 Valerie GOLAZ Directrice de Recherche, INED
3 Catherine GUIRKINGER Professeure a l’Universite de Namur
4 Michael GRIMM Professeur a l’Universite de Passau
5 Oliver VANDEN EYNDE Charge de Recherche, CNRSProfesseur a PSE
A mes grands-peres, Joel et Albert,dont l’amour des destinations lointaines et celui des etudes
ont seme les graines de cette these.
ACKNOWLEDGEMENTS-REMERCIEMENTS
All the stages of this PhD, from applying for a bourse de these to completing this manuscript,were made possible by the help and support of countless people. Here goes some acknowledg-ment of what has been given to me over these years.
My gratitude goes to Denis Cogneau, my advisor during this PhD. Ta rigueur, ton sens aigude l’ethique et ton interet pour les sciences sociales sont ce qui m’ont decidee a poursuivre unethese avec toi comme directeur, et je n’ai jamais regrette ce choix. Les chapitres qui composentcette these te doivent beaucoup: tu as toujours ete disponible pour examiner avec moi resultatset idees et pour relire une nouvelle (n-ieme) version de ces chapitres. Au dela de l’aspect“recherche” de la these, tes questions (parfois difficiles) et tes conseils (souvent tres justes) ontcontribue a la facon dont je me suis construite en tant que chercheuse. Enfin, dans les hauts etles bas de la these, tu as su trouver les mots pour ameliorer et mes papiers et mon moral. Merci.
I thank Oliver Vanden Eynde, who has followed this PhD from the start. I am extremely grate-ful to have benefited from your sharp insights and your kind words. Your feedback and sug-gestions always nudged me in the direction of the bigger picture when it was most needed. Ithank all the jury members for agreeing to read and review this PhD. I am grateful to MichaelGrimm and Catherine Guirkinger for their comments, suggestions, and encouragements. Ithank Valerie Golaz for having accepted to examine this PhD.
This PhD would, of course, not have been the same without my co-authors. I was extremelylucky to work with great researchers who also happened to be extremely different kinds ofresearchers. Merci d’abord a Rozenn Hotte avec qui ce second chapitre de these a ete ecrit. Cefut un plaisir d’echanger idees, recits de terrain et bouts de code pour construire ce projet. Je teremercie d’avoir dompte mes penchants a l’exactitude exhaustive (aussi connue sous le nom decollectionite de minuscules details) en faveur d’une vision d’ensemble du projet – je sais que cene fut pas une lutte facile. Il me reste encore un peu de progres a faire sur ce point et beaucoupa apprendre de toi, alors a tres vite au prochain projet !The third chapter of this PhD is joint work with Alexander Moradi and Catherine Boone. Alex,thank you for having invited me for the visiting stay at the University of Sussex that marked thebeginning of our collaboration. Thank you for having always treated me as a researcher whoseinputs were of equal value to yours despite age and status differences. I have enjoyed exploringKenyan history in your company. Catherine, your knowledge about Kenya, African politics,ethnicity is astonishing. This last chapter benefited greatly from your insights. I also thank you
v
for having invited me to join the Spatial Inequalities in African Political Economy Project as anAssociate Research Fellow. This position exposed me to new and exciting research questionsand responsibilities. It provided both new ideas that improved this PhD and a welcome changefrom working on my PhD. Finally, thanks for not having married me off in Kenya despite theappealing marriage proposals – this would probably have marked the end of my PhD.
Le financement de la recherche publique a permis la realisation de cette these, qui a ete fi-nancee par une bourse doctorale de l’ENS Paris-Saclay, l’EHESS, l’Universite Paris 1 et la LSE.Ma recherche ainsi que les sejours de recherche que j’ai entrepris ont ete possible grace a cesfinancements ainsi que les financements additionnels du groupe Developpement ainsi que dePGSE. Je remercie ici les equipes administratives de PSE: leur travail a rendu possible d’avoirun environnement de recherche si accueillant. Un merci plus specifique a Veronique Guilletinqui resout avec bienveillance et efficacite petites questions et gros problemes des doctoranteset doctorants de PSE.
To keep some structure in this acknowledgment section, the following paragraphs are orga-nized according to the distance between my home and the location of people and places. Thisvariable has the property of predicting almost perfectly the likelihood that events mentionedbelow took place before March 2020. First, my PhD was enriched by a research stay at the Uni-versity of Sussex (UK) and by fieldwork trips to Cote d’Ivoire and Kenya. At the University ofSussex, I thank members of the economics department, with special thanks to Panka Bencsik,Annemie Maertens, and Andy McKey. En Cote d’Ivoire, Hughes Kouadio, Albertine Kouadio,et Claire : a vous trois, merci beaucoup !In Kenya, I benefited from the help and support from the team at IFRA : merci beaucoup aMarie-Aude Fouere, Chloe Josse-Durand et Marion Asego. Thanks also to the BIEA team:their garden and library provided great places to work. For inviting me to join them duringtheir fieldwork, thanks to Lea Lacan and Miriam Waltz. For their help, support, and great com-pany, thanks to Alex Dyzenhaus, Clarissa Lehne, Riley Linebaugh, and Abla Safir. Thanks toLeigh Gardner, Michael Wahman, Andrew Linke, and Fibian Lukalo: it was a great work trip,not only because we saw elephants. Special thanks to Francesca di Matteo – merci de m’avoirprete tes lunettes d’anthropologue lors de ce sejour au Kenya. Qualitative interviews were onlypossible because I worked with four fantastic interviewers: Francis Onyango, Jehu Nyawara,Florence Mukami, and Esther Mbua. I am extremely grateful for their help and the care theyput into finding the best respondents. I also want to thank all the people who took the time toanswer my questions and were willing to trust me and to open up about their lives. Asantenisana! I also want to thank all the people with whom I had the opportunity to discuss my work– at conferences, in airplanes, on Zoom –. The interest you showed in my research and yourcomments and suggestions were extremely helpful.
Second, I benefited from an amazing research environment at PSE. Thanks to all members of thedevelopment group at PSE, for their insightful comments on my work, all their great presenta-tions at CFDS, and for discussions on Wednesday evenings. Merci a Sylvie Lambert et KarenMacours de faire en sortie que tout tourne toujours bien au sein du goupe Developpement.¡Gracias comrade Oscar! Organizing the CFDS with you taught me a lot, and we did havemuch more fun than worries. Je remercie plus particulierement Sylvie Lambert pour ses relec-
tures et commentaires pour tout les sujets portant sur les mariages et divorces. I thank toLuc Behaghel, Karen Macours, Akiko Suwa-Eisenmann, and Liam Wren Lewis for their in-sightful comments on my work. Merci aux membres du groupe d’Histoire economique pourl’ambiance stimulante mais bienveillante du groupe. Ma these a aussi ete enrichie par mesactivites d’enseignement a Paris 1: merci a mes etudiantes et etudiants. J’ai sans doute plusappris de vos questions que vous de mes reponses.
I was extremely lucky to have awesome officemates, not once, but twice. Long live R6-01! LaDream Team, premiere du nom – Paola, Rozenn, Lisa, Yasmine et Sarah–, merci d’avoir creecette atmosphere si speciale de bienveillance et d’entraide. Je mesure la chance que j’ai eue depouvoir vous poser absolument toutes les questions qui me passaient par la tete, notammenten debut de these. Lisa, thanks for rere(...)rereading my paper on intermarriages – you taughtme the basics of ”writing econ papers”. Rozenn, je t’ai deja citee en tant que coautrice maisje voudrais ici te remercier pour ton amitie toujours bienveillante, tes innombrables relecturesque ce soit pour cette these ou pour des candidatures, mais aussi pour ta franchise quant aufait que non, la recherche n’est pas toujours un long fleuve tranquille. Ton soutien m’a ete tresprecieux durant cette these.Many thanks to the G.C. Team, – Sarah, Andrea, Charlotte, Duncan, and Zhexun. Despitethe pandemic restrictions we were able to enjoy some glitter & pailettes times together. Yoursupport, ideas, and love of laughter have been a great joy during the past two years. Thanks tothe GIS/python/GNU-Linux/let’s push the efficiency frontier further crew: Etienne, Andrea,Aaron, Sarah and Ximena. Sarah, je t’ai deja mentionnee trois fois, et jamais trois sans quatre:cette these aurait ete bien moins drole et agreable sans toi. Je te suis/tu me suis depuis leM1, d’un batiment/bureau a l’autre, d’une Dream Team a l’autre, et ce jusqu’a la fin de nostheses respectives. Merci d’avoir partage mon enthousiasme pour les MOOC de recherchereproductible, les editeurs de texte, les articles du Guardian, les echanges de bouquins et lesstickers Zozo de Signal. Thanks to all my fellow PhD students who have made these yearsat PSE so lively and happy. Among those that I have not mentioned yet: Cristina, Emanuela,Georgia, Giulio, Helene, Ismael, Jonathan, Julieta, Juni, Justine, Kelsey, Malka, Maiting, MarionL, Marion R, Melanie, Paul D-P, Victor C, Victor P, Shaden, Yajna.
Third, a large chunk of this PhD was written at home. While working on a PhD in a globalpandemic has at times been challenging, I felt lucky to be able to focus (or attempt to focus) onmy PhD and to work from home. Que soit ici remerciees les personnes de qui j’ai partage lequotidien pendant ces annees de these: Benjamin, Bridget, Louise, Laureline et Tote. Merci aSam pour son soutien logistique et moral et pour avoir ete un excellent co-(bureau-du)-salon.Les confinements successifs ont ete plus doux grace a vous.
Enfin (else), que soient ici remerciees toutes les personnes qui m’ont entouree, accompagnee,nourrie et soutenue pendant toutes ces annees de these. Merci a ma famille pour leur soutiensans faille, leur sens de l’humour et leur capacite a parler d’autre chose que de ma these. Unimmense merci a mes parents qui ont toujours accueilli et encourage ma curiosite et qui m’onttransmis la perserverance et la tenacite. Tout cela, et bien plus encore, leur amour, m’ont porteelors de cette these. Thanks to all my friends who have not been mentioned above. Une mentionspeciale pour le club S.C. pour les fous rires, les vacances (!) et le soutien. Merci egalement a
toutes celles et tous ceux avec qui j’ai eu la joie de danser, de reparer des velos, de randonnerou de bivouaquer mais aussi de partager week-ends et soirees. Ces moments sont a la fois ceuxqui ont rendu le travail de cette these soutenable (l’oxygene avant d’y retourner) et agreable (lajoie de retrouver les projets et questions apres une pause).
SUMMARY
This PhD brings together three empirical chapters that are related to either ethnic identity ormarital decisions in sub-Saharan Africa.
The first chapter documents the evolution of interethnic and interfaith marriages in 15 coun-tries of sub-Saharan Africa using the Demographic and Health Surveys (DHS). I find that 20.4%of marriages are interethnic and 9.7% are interfaith, indicating that ethnic and religious differ-ences are not always barriers. The share of interethnic marriages increased and the share ofinterfaith marriages decreased. The increase in the share of interethnic marriages can onlypartly be explained by increases in urbanization and education levels. This finding suggeststhat changes in preferences and social norms may also be at work. The results show that someethnic boundaries became more porous whereas religious boundaries did not.
The second chapter provides new evidence on the consequences of parental divorce for chil-dren in Africa. Using survey data that collected the detailed life histories of Senegalese womenand their children, we investigate how children’s educational outcomes are affected by theirparents’ divorce. Using a sibling fixed-effect strategy, we find that younger siblings were morelikely than their older siblings to have attended primary school. This higher level of investmentdoes not persist in the long run. We find that custody and fostering decisions do not seem tomediate the positive effects on school attendance. Our findings are consistent with either an im-provement of the financial situation (due to remarriage) or an increase in the decision-makingpower of mothers after the divorce.
The third chapter assesses the effects of ethnic homogenization on public good provision bystudying a large-scale land reform program that took place in Kenya and led to a significantreduction in ethnic diversity. We implement a spatial regression discontinuity design. We find astrong discontinuity in ethnic diversity but no differences in school provision between programareas and counterfactual areas in the short run as well as in the long run. As individuals wereresettled to the program areas, they likely lack the dense social networks that favor collectiveaction. Our results are not driven by spillovers from treatment to counterfactual areas. Amediation analysis indicates that income effects are unlikely to drive this null result.
Keywords: Ethnicity, Marriage, Education, Religion, Children, Public Goods, Land Reform,Sub-Saharan Africa.
Ce doctorat rassemble trois chapitres empiriques qui se rapportent soit a l’identite ethnique,soit aux decisions matrimoniales en Afrique sub-saharienne.
Le premier chapitre documente l’evolution des mariages interethniques et interconfession-nels dans 15 pays d’Afrique subsaharienne a l’aide des enquetes demographiques et sani-taires (EDS). 20,4% des mariages sont interethniques et 9,7% sont interconfessionnels, ce quiindique que les differences ethniques et religieuses ne sont pas toujours des obstacles entre lesgroupes. Je mets en evidence que la part des mariages interethniques a augmente et que lapart des mariages interconfessionnels a diminue. L’augmentation de la part des mariages in-terethniques n’est qu’en partie expliquee par la hausse de l’urbanisation et du niveau moyend’education. Ceci suggere qu’il est probable que des changements dans les preferences et lesnormes sociales jouent un role dans l’augmentation de la part des mariages interethniques. Cesresultats indiquent que certaines frontieres ethniques sont devenues plus poreuses, mais queles frontieres religieuses ne le sont pas devenues.
Le deuxieme chapitre fournit de nouvelles connaissances sur les consequences du divorce desparents pour les enfants en Afrique. En utilisant les histoires de vie detaillees de femmessenegalaises et de leurs enfants, nous etudions comment les resultats scolaires des enfants sontaffectes par le divorce de leurs parents. Nous utilisons une strategie d’effet fixe fratrie et consta-tons que les freres et sœurs plus jeunes sont plus susceptibles que leurs aınes d’avoir frequentel’ecole primaire. Ce niveau d’investissement plus eleve ne persiste pas sur le long terme. Nousconstatons que les decisions relatives a la garde et au confiage ne semblent pas expliquer nosresultats. Ceux-ci pourraient etre expliques soit par une amelioration de la situation financierepour les meres remarriees, soit par un plus fort pouvoir de decision des meres apres le divorce.
Le troisieme chapitre evalue les effets de l’homogeneisation ethnique sur les investissementsen biens publics. Nous etudions pour cela un programme de reforme fonciere a grande echellequi a eu lieu au Kenya et qui a conduit a une reduction significative de la diversite ethnique.Cette experience naturelle nous permet d’utiliser un modele de regression sur discontinuitespatiale. Nous trouvons une forte discontinuite entre les zones du programme et les zones con-trefactuelles au niveau de la diversite ethnique mais aucune difference dans le nombre d’ecoles.Une analyse de mediation indique que les effets du revenu ne sont probablement pas a l’originede ce resultat nul. Les resultats sont probablement dus a la perte des reseaux sociaux densesen raison de la migration, et non a la diversite ethnique en soi.Keywords: Ethnicite, Mariage, Education, Religion, Enfants, Biens Publics, Reforme Agraire,Afrique sub-saharienne.JEL Classification: H41, I25, J11, J12, J13, J15, Z12, N37, N97, Q15
2.1 Characteristics of divorced women and of their children . . . . . . . . . . . . . 782.2 Balance test of characteristics according to children’s age at the time of divorce 872.3 Effect of parental divorce on primary school attendance and completion . . . 892.4 Sensitivity to the definition of the sample (columns) and to the definition of
being affected by divorce (rows) . . . . . . . . . . . . . . . . . . . . . . . . . . . 932.5 Heterogeneity of effects on attendance: custody and fostering decisions . . . 962.6 Heterogeneity of effects on attendance: Remarriage, age, and education of
mothers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98A-2.1 Correlations between individual characteristics and school attendance . . . . 103A-2.2 Characteristics of families according to children’s age at divorce . . . . . . . . 104A-2.3 Custody and fostering decisions after a divorce . . . . . . . . . . . . . . . . . . 105B-2.1 Robustness checks: Primary school attendance . . . . . . . . . . . . . . . . . . 106B-2.2 Schooling of children according to mother’s characteristics . . . . . . . . . . . 108B-2.3 Characteristics of families of divorced women according to the age composi-
tion of children at the time of the survey . . . . . . . . . . . . . . . . . . . . . . 108
3.1 Characteristics of settlement schemes . . . . . . . . . . . . . . . . . . . . . . . 1173.2 Is the program associated with ethnic homogeneisation? . . . . . . . . . . . . 1253.3 Program areas vs. conterfactual – Altitude & pre-treatment characteristics . . 1273.4 Program areas vs. counterfactual – Short run outcomes . . . . . . . . . . . . . 1283.5 Program areas vs. counterfactual – Long run outcome measures . . . . . . . . 129
xvi
List of Tables xvii
3.6 Program areas vs. conterfactual – Sponsor of schools . . . . . . . . . . . . . . 1303.7 Program areas vs. conterfactual – Border segment analysis (income potential) 1323.8 Program areas vs. counterfactual – Mediation analysis: Field size and long
run outcome measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137A-3.1 Schools in Kenya . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144A-3.2 Field size in low and high income potential program areas . . . . . . . . . . . 146A-3.3 Program areas vs. counterfactual – Long run outcome measures (Ethnicity
boundary FE controls) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147A-3.4 Program areas vs. conterfactual – Border segment analysis (ethnic majority) . 148A-3.5 Descriptive statistics: Population, schools, and students by ethnic majority . . 149A-3.6 Program areas vs. African Land Units – Altitude & pre-treatment character-
istics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149A-3.7 Program areas vs. African Land Units – Short run outcomes . . . . . . . . . . 149A-3.8 Program areas vs. African Land Units – Long run outcome measures . . . . . 150A-3.9 Program areas vs. African Land Units – Border segment analysis (income
This PhD brings together three pieces of empirical chapters that are related to either ethnic
identity or marital decisions in sub-Saharan Africa. I start by presenting the two issues dis-
cussed in this dissertation: Ethnic identity and marital decisions in sub-Saharan Africa. I then
present the chapters of my dissertation as well as “behind the scenes” insights from the quali-
tative interviews and new research developments.
General motivation
The starting point of this PhD was writing my M2 thesis on interethnic marriage patterns in
Kenya and Ghana. I found the topic fascinating and soon I was wondering about two big
questions – What is ethnic identity? How do people choose whom to get married to? – that
became the foundation of my PhD.
What exactly is ethnic identity, and how to measure it, has long been discussed (Kanbur et al.
(2011) provides a summary of the issues at stake). In this PhD, ethnic identity is defined as
what is measured in surveys and censuses as an individual’s ethnic group. In this dissertation,
I do not discuss the formation of ethnic categories (Lynch (2011), for an example on Kenya) or
how the list of ethnic categories to be included in a given survey is determined (Fearon, 2003).
It must be noted that ethnic identity is likely to be mismeasured. This PhD aims to provide
elements to back up the need for a more complex vision of ethnic identity and ethnic diversity
than usually present in most of the literature, and to do so without considering the fact that
ethnic categories are social constructs (a topic that has been more studied).
In the words of Chandra (2006) ethnic membership is “determined by attributes associated
with, or believed to be associated with, descent”. As such, concerns about ethnic membership
being passed on result, in most societies, in restrictive marriage rules (Anderson and Bidner,
2021). Here we are, back to the partner choice question! While marriage decisions may impact
2 Introduction
whether children identify as members of a certain ethnic group, ethnic groups also have differ-
ent marriage rules, that in turn affect the pool of potential partners. Rules can be as different as
prescribing that individuals marry outside of their clan, for instance in among the Luo group in
Kenya (Luke and Munshi, 2006), and allowing cousins to get married, for instance in Senegal
(Hotte and Marazyan, 2020).
In this PhD, “married individuals” (or to ”husbands”, ”wives”, ”partners”, ”spouses”) might
have married according to civil law, religious law, customary law, or... not have married yet.
Cohabitant couples are included in the same category as the married couples. Most unmarried
cohabitant couples have started the formalities needed for customary marriage, but have not
completed them, often because they are waiting for the husband-to-be to be able to afford the
bride price (a transfer from the groom or his parents to the parents of the bride). However,
these couples are still in a committed partnership that explains why they should be included in
the same category as the married couples. I use ”divorce” and ”separation” as interchangeable
words throughout this dissertation for the same reason.
Having defined the key concepts used in this PhD, I provide a more detailed introduction on
ethnicity and on marriages and divorces.
Ethnicity
Ethnic identity has been mostly taken, in economics, as given, and not as a historical construct
or as the product of individual decisions (be they rational or not). While the impact of ethnic
fractionalization has been investigated 1, how ethnic identities are formed and maintained has
received little attention in the ethnic politics literature, although ethnic identity formation and
ethnic diversity levels both need to be endogenized (Alesina and La Ferrara, 2005). 2 Surveys
conducted in African countries rarely allow respondents to select multiple ethnic affiliations –
or none at all – and the indicators based on these measures, such as the ethnolinguistic fraction-
alization index (Alesina et al., 2003) and polarization measures (Montalvo and Reynal-Querol,
2005), similarly consider all individuals are belonging to a single ethnic group. This gap is
even more puzzling knowing the literature that addresses the complexity and multi-layering
of ethnic identities (Chandra 2006; Posner 2005). There are (at least) two understudied research
questions regarding ethnicity in sub-Saharan Africa: the first one is how ethnic identities are
”transmitted” across generations and the second is how ethnic diversity is made up (at the
1Since the seminal paper of Easterly and Levine (1997a) on ethnic diversity and growth in Africa, ethnic frac-tionalization and its consequences have been widely studied.
2Michalopoulos (2012) studies the origins of linguistic diversity. Ahlerup and Olsson (2012) proposes a theoreti-cal model that explains the emergence of groups within populations.
Introduction 3
national and sub-national levels).
Regarding the first question, Bisin and Verdier (2000) develop a model of transmission of iden-
tity across generations. They find that identity is transmitted mostly through marriage: inter-
marriages result in minority-group parents having access to a weaker socialization technology
and thus in their children been less strongly attached to the minority identity. Intermarriages
have thus long been used to measure the strength of cleavages within societies (Kalmijn, 1998),
as they combine a measure of segregation (who meets whom and where) and a measure of
who is thought to be an acceptable spouse. Chapter 1 aims to provide quantitative evidence on
intermarriage is sub-Saharan Africa.
Regarding the second question, the fact that states play a key role in shaping levels of ethnic
diversity has been demonstrated. Government decisions or approbation is needed when col-
lecting data on ethnicity and may affect the categories listed in a given country (i.e. are minority
groups explicitly included in the ethnic classification or are they lumped together in the catch-
all ”other groups” category?). Ethnic classifications themselves affect the level of ethnic diver-
sity that is measured (Fearon, 2003; Posner, 2004). Another type of intervention is resettlement
policies, either ”nation-building policies” (Bazzi et al., 2019; Miguel, 2004) or forced migration
policies (Charnysh and Peisakhin, 2021; Miho et al., 2019). A more extreme type of intervention
is those that aim to change the level of ethnic diversity at the national level throughout ethnic
cleansing (Conversi, 2010). Chapter 3 studies the impact of an ethnic-based redistributive land
reform program (ethnic homogenization policy) on school provision in Kenya.
Marriages and divorces
Partner choice is one of the most important decisions that affect individuals’ welfare (Becker,
1973). In most sub-Saharan Africa, parental involvement in the spouse choice remains strong
(Fafchamps and Quisumbing, 2007). Parents might want their children (or at least some of
their children) to marry someone who is from the same group as them (Anderson and Bidner,
2021), whether this group is defined as the ethnic group (to ensure the transmission of their eth-
nic identity (Bisin and Verdier, 2000)) or the extended family (notably for insurance purposes
(Hotte and Marazyan, 2020)), though these concerns might not always be fully compatible with
assortative matching on other dimensions, such as education (Furtado and Theodoropoulos
(2011), on US marriage markets).
However, marital rules and traditions are non-static: parental involvement is decreasing in
some contexts (Bertrand-Dansereau and Clark, 2016) and not all individuals marry according
4 Introduction
to the social norms. Chapter 1 aims to better understand the partner choice in sub-Saharan
Africa and to capture some of the changes in marriage-related norms.
If partner choice is so important, it is also because of the risks associated with having chosen
the wrong partner: another key decision is whether to divorce or not. Most countries in sub-
Saharan Africa are characterized by high marital instability: Clark and Brauner-Otto (2015)
estimate that approximately 25% of first unions end in divorce. Consequences of divorce on
women’s welfare have been studied more in-depth (Djuikom and van de Walle, 2018; Lambert
et al., 2019) than the consequences of divorces on children (Chae, 2016). Chapter 2 aims to assess
the impact of divorce on children’s educational outcomes in Senegal.
This dissertation
This dissertation is made up of three empirical research papers that all relate to either ethnicity
or marriages in sub-Saharan Africa. The first chapter documents patterns of interethnic and
interfaith marriages in 15 countries of sub-Saharan Africa. The second chapter discusses the
consequences of parental divorce on children’s educational attainment in Senegal. The third
chapter investigates the effects on public good provisions of a Kenyan land redistribution pro-
gram that resulted in ethnic homogenization of program areas. I present the outline of each
chapter, sometimes linking it to insights from fieldwork and archival work, as well as to new
research development. A more detailed introduction to the data and methods used can be
found at the end of this section.
Chapter 1 – Interethnic and interfaith marriages in sub-Saharan Africa
Summary of the paper This paper documents interethnic and interfaith marriage patterns to bet-
ter understand which identity-related cleavages matter in sub-Saharan Africa. Using Demo-
graphic and Health Surveys (DHS) spanning 15 countries, I build a representative sample of
women born between 1955 and 1989. Extrapolating to the population of these countries, I find
that 20.4% of marriages are interethnic and 9.7% are interfaith, indicating that ethnic and re-
ligious differences are not always barriers. Accounting for diversity levels, both shares are
similar. Regarding the pooled sample of these 15 countries, the share of interethnic marriages
increased, and there is no country where interethnic marriages became less frequent. The share
of interfaith marriages decreased in the pooled sample. Only in Cameroon did interfaith mar-
riages become more frequent. The share of Muslim-Christian marriages remained stable in the
pooled sample. The increase in the share of interethnic marriages can only partly be explained
by increases in urbanization and education levels, suggesting that changes in preferences and
Introduction 5
social norms may also be at work. The decrease in the share of interfaith marriages is due to
decreasing levels of religious diversity: traditional religions were replaced by Islam and Chris-
tianity. These results show that some ethnic boundaries became more porous whereas religious
boundaries did not. However, religious boundaries shifted as a result of changes in the reli-
gious landscape.
Insights from qualitative research & links to new research While these insights were not presented
in this dissertation, I spent time in Cote d’Ivoire and in Kenya to conduct qualitative inter-
views on the topics of marriages, divorces, and intergenerational transmission. One question
that I had been curious about was the process by which information on the ethnic identity of
individuals was produced. How was this question asked and how did people know how to
answer it? While I did not directly ask this question, what became clear during interviews is
that there is the short story (the one box that will be ticked by the interviewer) and the longer
story. I have started to think about the question “What is your ethnic group?” as similar to the
question “Where were you born?”, for which an interviewee could answer ”In Paris” to then
tell a story about moving to a small town in Brittany at the age of 2 and having rarely visit the
capital city since. The standard questionnaire answer hence does not measure the ”true value”
of individuals’ ethnic identity. A respondent, having presented himself as a Luo man, without
adding any qualifiers, suddenly remarked that of course he was able to deal with a situation that
involved thugs – because he heard them speak Luhya and was able to reason with them. When
I asked him how he had learned this language, he simply remarked: ”Oh, but my mother is
Luhya.” Another story was that of a Christian woman in Kenya. When she told me the story of
her marriage, I learned that she had grown up in a Muslim family and had converted to marry
her Christian husband. So far, nothing too remarkable. But when I asked her whether her par-
ents had approved of the marriage, she quickly answered that her mother has grown up in a
Christian family and had converted to Islam before marrying. This piece of information that
was to me so important (a family with two generations of interfaith marriages!) and explained
so much has not been mentioned earlier in the interview. Both these stories are examples of
situations in which individuals’ mixed ascendance matter to understand their choices and de-
cisions, but also stress how strongly social norms dictate how individuals should introduce
themselves. It is not a coincidence that these stories both have a strongly gendered dimension:
social conventions in Kenya dictate that children belong to their father’s ethnic group and it
seems that women are much more often the ones who convert before getting marrying than
men.
Another question that became salient while conducting interviews was the question of in-
6 Introduction
tergenerational transmission. A striking pattern emerged during interviews: in interethnic
families, parents were very likely to address their children in a language that was not their
mother tongue, usually Kiswahili in Kenya and French in southern Africa. Some respondents
expressed that ”tribes” were a thing of the past, and that they did not intend for their chil-
dren to learn another language, others wished that their children would learn their mother
tongue – a wish often associated with the idea of sending the children ”back home” to speak
with their grandparents. One father in Cote d’Ivoire jokingly told me that, frustrated by the
fact that his primary- school-aged children could understand his mother tongue but not speak
it, he had simply pretended not to understand any other language anymore, especially when
his children were bickering. His children, he reported, quickly switched to his mother tongue
when addressing him. While such parental strategies might succeed, it seems likely that chil-
dren born to parents of different ethnic groups will have weaker ethnic attachment than their
counterparts, not least because they cannot speak fluently the language associated with their
parents’ home villages. This intuition has been back up by Dulani et al. (2021). In a recent pa-
per, the authors explore one of the implications of high rates of intermarriages by examining
the electoral preferences of multiethnic voters in Kenya and Malawi. They find that mixed in-
dividuals are less likely to support the party associated with their stated ethnic group, relative
to mono-ethnics. Their findings stress the importance of updating the theoretical and empirical
approaches used when addressing ethnicity in African countries.
Chapter 2 – Parental divorce and children’s educational outcomes in Senegal
The second chapter of this PhD, written jointly with Rozenn Hotte (CY Cergy Paris Univer-
sity), deals with the issue of divorces in Senegal and their impact on children’s educational
outcomes. This paper provides new evidence on the consequences of parental divorce for chil-
dren in Africa. Using survey data that collected the detailed life histories of Senegalese women
and their children, we investigate how children’s educational outcomes are affected by their
parents’ divorce. We use a sibling fixed-effects strategy that allows us to control for all the
factors that are common to all children in a family, such as parental preferences regarding edu-
cation or the level of education of the parents, alleviating concerns of omitted variable bias. We
compare children who were old enough to have been enrolled in primary school at the time of
the divorce to their younger siblings, for whom enrollment decisions had not yet been made
at the time of the divorce. We find that younger siblings were more likely than their older sib-
lings to have attended primary school. This higher level of investment does not persist in the
long run: there are no differences between siblings when considering primary school comple-
tion. We find that custody and fostering decisions do not seem to mediate the positive effects
Introduction 7
on school attendance. Our findings are consistent with either an improvement of the financial
situation (due to remarriage) or an increase in the decision-making power of mothers after the
divorce.
Chapter 3 – Ethnic homogenization and public goods: Evidence from Kenya’s land reform
program
Summary of the paper The third chapter of this PhD, written jointly with Catherine Boone (Lon-
don School of Economics and Political Science) and Alexander Moradi (Free University of
Bozen-Bolzano), is concerned with the consequences of land reform and ethnic homogenization
on public good provision. In this paper, we examine the effects of ethnic homogenization on
public good provision using a natural experiment that took place in Kenya. We study a large-
scale land reform program that led to a significant reduction in ethnic diversity, the settlement
schemes program. Using a novel dataset about the precise location of program area boundaries
(Lukalo et al., 2019) that we combine with archival, survey, census, and satellite data, we im-
plement a spatial regression discontinuity design. We argue that the border between program
areas (treatment) and neighboring areas (counterfactual) is plausibly random at the local level
and confirm that there are no observable differences in pre-treatment characteristics. We find a
strong discontinuity in ethnic diversity but no differences in school provision between program
areas and counterfactual areas in the short run as well as in the long run. As individuals were
resettled to the program areas, they likely lack the dense social networks that favor collective
action to either hold politicians accountable or to provide public goods throughout cooperation
at the community level. Our results are not driven by spillovers from treatment to counterfac-
tual areas. A mediation analysis indicates that income effects are unlikely to drive this null
result.
Talking about land in Kenya The third chapter of this PhD is my first economic history chap-
ter and my first interdisciplinary research collaboration. I was lucky to join the team of the
Spatial Inequalities in African Political Economy Project for a trip to Kenya in May and June
2019 during which I got the opportunity to listen to talks on land in Kenya that were given by
Kenyan researchers and students. These offered fascinating insights on which research ques-
tions were deemed important across academic and cultural settings. Among the (selected)
sample of Kenyan researchers and study who I saw present their research, many worked on
the link between subsistence income and plot size. What is the minimal plot size needed to sup-
port the needs of a Kenyan family? were they asking. What I could hear was there is not enough
land. The issue of inequality and land redistribution often came up when audiences reacted
8 Introduction
to the presentation of the preliminary findings of Boone et al. (2021), and almost always what
people were discussing was the issue of class inequality, not ethnic inequality. Having read so
many papers about ethnicity, I sometimes forget that indeed, this is not necessarily the relevant
cleavage (not that this was new to me).
Data & methods
Data In this dissertation, I have used detailed household survey data to study marriages
and divorces and geolocalized data, and archival documents. This PhD also offered me the
opportunity to collect qualitative data in Cote d’Ivoire and Kenya as well as to conduct archival
research in the British National Archives.
Household survey data The Demographic and Health surveys (DHS) were used in Chapter 1 and
Chapter 3. DHS questionnaires procure information on who is married to whom (within the
household), and in specific countries and waves, these questionnaires also procure information
on respondents’ ethnic and religious identity. Respondents (women and men) are asked to re-
port their own ethnic identities, which makes the measure much less sensitive to measurement
error and declaration bias compared to other surveys (some of the Living Standard and Mea-
surement Surveys only ask either the household head or the most knowledgeable person about
other household members to declare the ethnic identity of all household members). Another
advantage of using the DHS data is that the surveys cover a large range of countries and a long
period – the earliest surveys that collected information on ethnic identity in African countries
were implemented in 1992 and the program is still ongoing.
The second wave of the Pauvrete et Structure Familiale (PSF) survey that was conducted in Sene-
gal in 2011 was used in Chapter 2. The survey is described in detail in De Vreyer et al. (2008).
This database provides an extremely rich and detailed account of the lives of Senegalese house-
holds and includes detailed information on marital histories, consumption, and migration. Sur-
veys conducted in sub-Saharan Africa rarely record marital histories and often only record the
marital status of respondents at the time of the survey. The PSF database includes information
on all the unions that a respondent experienced (date of start and end of the union and rea-
son why the last union that ended did so) which allowed us to identify divorced women and
their children instead of assuming whether a woman who had remarried had divorced or been
widowed. The PSF database also includes information on all children of household members.
As we retrieve information on divorces through each child’s mother, we need to have informa-
tion on children who are not living with their mother (i.e. who are not living in the surveyed
household) for our estimates not to be biased by endogenous decisions.
Introduction 9
Shapefiles of settlement schemes and archival documents on the land reform program Chapter 3 was
made possible by the access to the exact boundaries of the settlement schemes from Lukalo
et al. (2019), who constructed a map layer from over 1,500 digitized Registry Index Maps (RIM)
kept by Survey of Kenya in 2018. Polygons were joined with attribute data from the Ministry
of Lands and Physical Planning (MoLPP) dataset on Kenyan settlement schemes, presented in
Lukalo and Odari (2016). The final database includes shapefiles with boundary information
as well as information on schemes. We also relied on documents from the British National
Archives to obtain more information on the program. Presenting the complete list of the data
used in this chapter exceeds the purpose of this introduction. A detailed list can be found in
Section 3.2.
Qualitative interviews in Cote d’Ivoire and Kenya I conducted qualitative interviews regarding
marriages and divorces in Cote d’Ivoire (21 respondents) who were living in and close to Abid-
jan in June 2016. I also conducted the same type of qualitative interviews in Kenya in May and
August 2019, where I interviewed 52 respondents in urban and rural Kenya. Interview loca-
tions included Nairobi (the capital city) and Ongata Rongai (periurban areas close to Nairobi);
Kisumu (the third largest city of Kenya, located in the west of the country) and in rural areas in
Kisumu county; Nakuru (the fourth largest city of Kenya, located in central Kenya); Mombasa
(the second largest city of Kenya, located on the coast) and villages in Kilifi and Kwale counties.
These areas were selected to ensure a diverse range of settings as well as to be sure to interview
people from different backgrounds.
Methods These three chapters are all empirical ones for which I have relied on a wide range
of methods. Chapter 1 is a descriptive paper in which the main challenges were to think about
the representativeness of the data, and especially how observations needed to be weighted
when several surveys were pooled together (a point the DHS documentation is mostly silent
one) and then used to assess time trends. Chapter 2 relies on a sibling fixed effect specification
that is common in family economics and has already been used to study the impact of divorce
on children in developed countries (Bjorklund and Sundstrom, 2006; Le Forner, 2020). As many
parental and family characteristics are likely to simultaneously influence the probability that
parents divorce and the schooling of their children, a simple comparison of children according
to the divorce status of their parents would be biased. Sibling fixed effects control for all the
factors that are common to all children in a family, such as parental preferences regarding
education or the level of education of the parents. Short of allocating divorce randomly, this
strategy is one of the few that can get closer to a causal estimate of the impact of divorce.
Chapter 3 combines the use of a natural experiment with a spatial regression discontinuity
10 Introduction
design. The natural experiment is a redistributive land reform program, the settlement scheme
program that was implemented in Kenya after independence. Selection of households into the
program was made based on area-specific ethnic criteria, resulting in an ethnic homogenization
of program areas as a result of its implementation. We use the exact borders of these program
areas to assess the impact of an ethnic homogenization policy on the provision of public goods
(schools so far).
CHAPTER 1
INTERETHNIC AND INTERFAITH MARRIAGES IN SUB-SAHARAN AFRICA
Abstract1 This paper documents interethnic and interfaith marriage patterns to better under-stand which identity-related cleavages matter in sub-Saharan Africa. Using Demographic andHealth Surveys (DHS) spanning 15 countries, I build a representative sample of women bornbetween 1955 and 1989. Extrapolating to the population of these countries, I find that 20.4%of marriages are interethnic and 9.7% are interfaith, indicating that ethnic and religious differ-ences are not always barriers. Accounting for diversity levels, both shares are similar. Regard-ing the pooled sample of these 15 countries, the share of interethnic marriages increased, andthere is no country where interethnic marriages became less frequent. The share of interfaithmarriages decreased in the pooled sample. Only in Cameroon did interfaith marriages becomemore frequent. The share of Muslim-Christian marriages remained stable in the pooled sam-ple. The increase in the share of interethnic marriages can only partly be explained by increasesin urbanization and education levels, suggesting that changes in preferences and social normsmay also be at work. The decrease in the share of interfaith marriages is due to decreasing lev-els of religious diversity: traditional religions were replaced by Islam and Christianity. Theseresults show that some ethnic boundaries became more porous whereas religious boundariesdid not. However, religious boundaries shifted as a result of changes in the religious landscape.
1This chapter was published in World Development in 2020 (Crespin-Boucaud, 2020).I extend special thanks toDenis Cogneau for carefully reading my paper several times and for suggesting numerous changes that greatlyimproved the paper. I am grateful to Yannick Dupraz, Sylvie Lambert, Alexander Moradi, Lisa Oberlander, andOliver Vanden Eynde for insightful discussions and comments on previous versions of this work. I thank seminarparticipants at the Paris School of Economics (PSE) and at the University of Sussex as well as two anonymousreferees for their constructive comments.
12 Interethnic and interfaith marriages in sub-Saharan Africa
1.1 Introduction
Social identity is an individual characteristic that has long been demonstrated to be complex
and multi-layered (Posner, 2005). However, a large fraction of the literature on sub-Saharan
Africa has relied on a unidimensional view of identity, equating identity with ethnicity. Even
though surveys and censuses in most countries now include categories for “mixed race” or
“mixed ancestry”, very few large-scale surveys conducted in African countries include such an
option, perpetuating the idea that ethnic identity is the allegiance to one group or tribe, and to
one homeland. Moreover, while the impact of ethnic fractionalization has been investigated2,
the manner in which ethnic identities are formed and maintained has received little attention.
As ethnicity is transmitted through descent3, we would expect interethnic marriages to be rare
in societies where ethnic cleavages are rigid: marrying within one’s group is a means for an
individual to ensure that her/his identity is passed down to her/his children4. Intermarriages
have long been used to measure the strength of cleavages within societies (Kalmijn, 1998), as
they combine a measure of segregation (who meets whom and where) and a measure of who is
thought to be an acceptable spouse. However, there is no quantitative evidence on interethnic
marriages in the case of sub-Saharan Africa. This paper aims to fill this gap.
I study interethnic and interfaith marriages in 15 countries in sub-Saharan Africa using data
from the Demographic and Health Surveys (DHS). This paper has four aims: providing de-
scriptive statistics on interethnic and interfaith marriages, discussing results at the extensive
margin (marrying outside one’s ethnic group) and the intensive margin (how far outside one’s
ethnic group?), assessing time trends, and analyzing which factors have contributed to the
time trends. Contrasting interethnic with interfaith marriages provides a broader picture of
intermarriage patterns in African countries, but it must be emphasized that religious identity
is more fluid than ethnic identity, as conversion allows individuals to change their religious
affiliation.
First, I find that 20.4% of married women are in an interethnic union, contrasting with 9.7%
2Since the seminal paper of Easterly and Levine (1997a), several works have pointed out the detrimental effectsof ethnic diversity on growth and public good provision (Alesina and La Ferrara, 2005; Churchill and Smyth, 2017;de la Cuesta and Wantchekon, 2016; Gershman and Rivera, 2018), Goren (2014), with only a few surveys reachingdifferent conclusions (Gisselquist et al., 2016).
3Chandra (2006) defines ethnic membership as “determined by attributes associated with, or believed to beassociated with, descent”. Kanbur et al. (2011) summarize debates over definitions of ethnic identity and of relatedmeasures. In this paper, I define ethnic groups based on DHS classification, and I do not discuss how intermarriagescould have influenced or determined these classifications.
4Bisin and Verdier (2000) develop a model of transmission of identity across generations and find that intermar-riages result in minority-group parents having access to a weaker socialization technology. In sub-Saharan Africa,ethnic fractionalization does not necessarily result in majority-minority settings. I nonetheless expect the theoret-ical result put forward to hold: high shares of intermarriage should be associated with weaker ethnic/religiousaffiliations for parents and their children.
1.1 Introduction 13
being in an interfaith union. Interethnic unions are hence far from rare events in sub-Saharan
countries, and their share ranges from 10.4% in Burkina Faso to 46% in Zambia. Interfaith
marriage shares range from 1.8% in Niger to 19.3% in Cote d’Ivoire. Second, using a sample
of women born between 1955 and 1989, I find that interethnic marriages became more com-
mon for later-born cohorts relative to earlier-born ones, while interfaith marriages became less
common. There is no country where the share of interethnic marriages decreased, and inter-
faith marriages increased only in Cameroon. Third, building on recent research on how to
measure ethno-linguistic diversity (Desmet et al., 2016; Gershman and Rivera, 2018), I compute
new linguistic distance measures that allow me to take into account diversity within and across
countries. In the case of interfaith marriages, I do not use a distance measure but instead study
Muslim-Christian marriages separately, as this type of union is arguably the most distant kind
of interfaith marriage in sub-Saharan Africa. I find that changes at the extensive margin do
not translate into changes at the intensive margin. Interethnic marriage shares increased, but
there is no clear pattern regarding variation in linguistic distance. Interfaith marriage shares
decreased, but Muslim-Christian marriage shares remained stable. Fourth, I examine whether
time trends on intermarriage shares can be explained by increased education and urbanization
levels. To do so, I compare time trends across specifications with controls and without controls.
The results for interethnic marriages point at the fact that, while education and urbanization
play a role in the increase of interethnic marriage shares, part of the increase could come from
changes in norms and preferences about interethnic marriages. Likewise, I find that urbaniza-
tion and education are not the key drivers of the decrease in interfaith marriages: this decrease
is mostly due to the decreased levels of religious diversity over time. This study of intermar-
riages finds that some – though not all – ethnic boundaries became more porous. Religious
boundaries did not become more porous, but the religious landscape changed as traditional
religions were replaced by Islam and Christianity. Finally, I confirm that my results are robust
to varying definitions of intermarriages. I also test the hypothesis that spouses become more
similar as the length of their marriage increases. Ethnic “assimilation” does not drive the re-
sults. However, there is evidence of conversion during marriage: my estimate is a lower bound
on the decline of interfaith marriages.
This paper contributes to three strands of the literature. First, this paper extends the empirical
literature on intermarriages (Fryer Jr, 2007; Furtado and Theodoropoulos, 2011; Kalmijn and
Van Tubergen, 2006; Monden and Smits, 2005; Qian and Lichter, 2007, 2011). Second, it con-
tributes to a growing literature that nuances or contests the idea that ethnicity is always the
key cleavage in sub-Saharan Africa. Contributions have suggested that which identity cate-
gory is salient depends on the context (Eifert et al., 2010; Miles and Rochefort, 1991). Looking
14 Interethnic and interfaith marriages in sub-Saharan Africa
at the micro-level literature, Berge et al. (2018) show that there is little evidence of co-ethnic
bias in behavior games, contradicting results based on Implicit Association Tests (IAT) (such as
Habyarimana et al. (2007) in Uganda; Lowes et al. (2015) in DRC). The complexity of relation-
ships between ethnic groups (at the political level) was emphasized by Francois et al. (2015) and
Mozaffar et al. (2003) regarding electoral coalitions and power sharing. Simson (2018) shows
that once education is controlled for, public sector jobs in Kenya and in Uganda are rather eq-
uitably distributed along ethnic lines. Third, this paper adds to the literature comparing the
evolution and salience of ethnic and religious cleavages (McCauley, 2014).
The rest of the paper is organized as follows: Section 1.2 presents the data. Section 1.3 lists
factors that could explain the prevalence of intermarriages. Section 1.4 presents the empiri-
cal strategy used. Section 1.5 reports results on the pooled sample and section 1.6 results at
the country-level. Section 1.7 tests alternative stories and provides robustness checks on the
findings. Section 1.8 concludes.
1.2 Data
In this section, I present the data sources used and explain how the sample is built.
Data sources: DHS and Ethnologue
I use Demographic and Health Surveys (DHS) that were implemented in sub-Saharan Africa
(surveys used listed in Table A-1.1, Appendix A-1.1.). DHS questionnaires procure information
on who is married5 to whom (within the household), and in specific countries and waves, these
questionnaires also procure information on respondents’ ethnic and religious identity. The
descriptive sample includes the 25 countries with information on ethnic identity6. The main
sample is made up of 15 countries for which there are at least two survey waves that gather in-
formation on the ethnic and religious identity of respondents: Benin, Burkina Faso, Cameroon,
and Zambia. The main sample is made up of women born between 1955 and 1989 and of their
husbands.
Additionally, I exploit the Ethnologue dictionary (Simons and Fennig, 2017) to get information
5Throughout the paper I use the terms “marriage,” “spouse,” “husband,” and “wife” to refer to married couplesas well as to cohabiting couples.
6These 25 countries are the 15 countries from the main sample, plus Central Africa Republic, Chad, the Republicof the Congo (Congo-Brazzaville), the Democratic Republic of the Congo (DRC), Ethiopia, Liberia, Mozambique,Namibia, Nigeria, and Sierra Leone. The countries included in this study are not a random sample of Africancountries. Including a question on the respondent’s ethnic identity is not a decision made independently fromwhether ethnicity matters in a country: the sample above cannot be considered to be representative of countriesthat did not include such a question in DHS.
1.2 Data 15
on the classification of each ethnic group’s traditional language. I use these classifications to
compute the linguistic distance of all of the pairs of ethnic groups. For each pair, I identify
the lowest common linguistic node that they share and compute the number of nodes between
each group and the common node. The mean of these two distances is the linguistic distance
of this pair (detailed methodology in Appendix A-1.2.).
Comparability over time: Reweighting and recoding
The main sample includes at least two data waves for each country, thus raising issues about
comparability over time. I explain briefly the steps taken to ensure that I can identify time
trends using this sample (the online Appendix details the processes used in this study).
The main sample is made up of women born between 1955 and 1989: for each year within
this period, the sample includes women from all of the 15 countries of the main sample. I
reweight the sample to make it representative of the population of married women in each
country. Reweighting and selecting the time period 1955-1989 ensure that the share of each
country remains (roughly) constant over time. Changes over time are hence not due to changes
in the respective weights of countries in the sample over time.
I recode both ethnic and religious categories to build a classification that fulfills two criteria.
First, the classification does not vary within a country. Second, for all of the cohorts and survey
waves, all of the groups listed in this classification have a least one member of each gender.
Grouping in fewer categories mechanically decreases the number of unions appearing as in-
terethnic/interfaith, so a time-invariant classification is needed to measure changes over time.
After recoding, ethnic classifications are specific to each country and include less than 10 cate-
gories for most countries7. The category “other (ethnicity)” groups together members of ethnic
groups that were not listed in all waves, people who did not identify with a specific group, and
foreign nationals. Recoding religious classifications makes apparent a key change during the
survey period: the surge of Pentecostalism in Africa (Mayrargue, 2004; Meyer, 2004). Changes
in classification are likely to reflect the agenda of church leaders, as “new Churches” have an
interest in being formally recognized in order to proselytize, which is not the behavior of faiths
less invested in proselytizing, such as traditional religions. Because new faith groups continue
to be listed, harmonizing nomenclatures across waves requires a high level of aggregation:
Christian, Muslim, other. The category “other (faith)” includes followers of traditional reli-
7Depending on the countries, ethnic classifications became more or less detailed. For instance, Akan subgroupsare listed separately in Ghana in the older survey waves but are only listed as “Akan” in the recent survey waves.However, the reverse phenomenon happened in Kenya, where groups listed together (Meru/Embu) are listed sep-arately in more recent survey waves.
16 Interethnic and interfaith marriages in sub-Saharan Africa
gions, atheists, and members of new religious movements that cannot be linked to Christianity
or Islam. Among the women who belong to the “other (faith)” group, at least 41.8% identify
with a traditional religion. It is a lower bound on their share, as many survey waves do not
distinguish traditional religions from other faiths that do not belong to Christianity or Islam.
Variables: Intermarriages and individual characteristics
To study intermarriages, I build variables that measure intermarriages as well as variables that
are likely to influence the likelihood of intermarriage.
Ethnic and religious identity are self-declared in the DHS: I hence consider that the respon-
dent’s answer is a measure of their “true identity”. A marriage is interethnic (interfaith) if the
spouses’ answers correspond to different ethnic (faith) categories in the recoded classification.
Ethnic and religious identity categories may be fluid and change, especially in the case of con-
version for marriage. I discuss how religious conversion and ethnic “assimilation” might affect
my results in section 1.7.
I consider two main variables that can lead to intermarrying: education and urban residence.
The DHS include little retrospective information, so I cannot reconstruct the individual char-
acteristics at the time that the marriage started. Marriage decisions are taken based on the
characteristics of individuals but also on expectations, such as joining a spouse in the city or
being able to graduate high school. I use characteristics at survey date to proxy for past char-
acteristics and expectations: current characteristics do not perfectly correspond to past char-
acteristics but allow me to take into account (realized) expectations. I use information on the
highest completed level of education at survey date, which should be a good proxy of the level
of education at the time that the union was formed8. I use urban residence at the survey date.
Migration mostly takes place from rural to urban areas9: the current place of residence captures
some unobserved characteristics of individuals that might be correlated to their propensity to
intermarry, such as one’s occupation. Using this variable thus results in overestimating the
relationship between urban residence and intermarriage.
8It is unlikely that women can stay in school after getting married. Considering only women in union who haveattended primary school, I find that, under the assumption that girls start school at age 8, 69% of women startedtheir first cohabitant union at least two years after completing their schooling, 21% around the same time as theycompleted their schooling, and 10% before that.
9Among women for whom I have information on childhood place of residence, 12.6% of women who live in arural area at the time of the survey grew up in an urban area, whereas 41.3% of women living in an urban area grewup in a rural area. As this sample consists of women belonging to earlier-born cohorts, these figures may be evenhigher for later-born cohorts.
1.3 Intermarriages and marriage markets: Preferences, norms, and diversity 17
1.3 Intermarriages and marriage markets: Preferences, norms, and
diversity
I provide in this section a framework for interpretation of the models and of the results pre-
sented in the paper.
Types of factors influencing intermarriage shares
In his seminal paper, Kalmijn (1998) distinguishes three factors that could explain the preva-
lence of intermarriages: individual preferences, diversity levels within (local) marriage mar-
kets, and the influence of norms and of third parties.
The individual preferences factor gathers all of the preferences that individuals have concerning
their matches on the marriage market. Two main characteristics of matches on the marriage
market are socio-economic resources and cultural resources: people are likely to want to marry
someone whose economic prospects are good and with whom they share values and prefer-
ences.
The diversity level factor encompasses all of the channels related to how diverse marriage mar-
kets are. Some societies are highly heterogeneous, and others are more homogeneous, for in-
stance if there is a majority group. Moreover, spatial segregation affects how diverse local
marriage markets are. Low levels of diversity are associated with low levels of intermarriages
(by sheer limitation due to the numbers of potential spouses from other groups).
The third parties/norms factor includes the channel of group identification, the one of group
sanctions, and in the case of the setting studied, the fact that members of one’s kin may be
directly responsible for choosing one’s spouse. Field studies have shown transitions from kin-
selected to self-selected marriages, for instance, Bertrand-Dansereau and Clark (2016) (Malawi)
and Clark et al. (2010) (Kenya): elders and parents are less involved in the matching process.
Third parties influence is likely to work against intermarriages (Sporlein et al., 2014).
DHS variables and the Kalmijn framework
As indicated in section 1.2, I use two variables that are likely to influence the likelihood of
intermarriage: education and urban residence. These variables capture aspects of the types of
factors listed above.
Education could affect individual preferences through several channels. Education, especially
18 Interethnic and interfaith marriages in sub-Saharan Africa
secondary and higher, is in many countries conducted in a vehicular language, thus helping
to remove language barriers in marriage markets. Additionally, by transmitting a common
culture, education could switch preferences away from group identification and towards a na-
tional identification. Moreover, higher education takes place in (mixed) urban settings (diver-
sity level factor). Educated women might have more of a say in the choice of their spouse: third
parties may be less involved in the matching process.
Urban areas are on average more mixed than rural ones: diversity levels are likely to be higher
in cities than in the countryside, and marriage markets are likely to be less segregated. Social
norms may be different in cities, and more accepting of intermarriages.
How did marriage markets change?
Figure 1.1: Changes over time: Education, Urban residence, Diversity levels
0
10
20
30
40
50
1955
-59
1960
-64
1965
-69
1970
-74
1975
-79
1980
-84
1985
-89
Primary only Secondary/HigherLiving in urban area
0
10
20
30
40
50
1955
-59
1960
-64
1965
-69
1970
-74
1975
-79
1980
-84
1985
-89
Religious fractionalizationShare other religion
Sample & data: Women in union, pooled sample. 95% confidence intervals included (except on the measure of reli-gious fractionalization).Left panel: Share of women living in urban area, share of women whose highest completed education level is pri-mary/secondary school. The share of women living in urban areas reflects current place of residence: the magnitudeof the change would be higher using information on type of place of birth.Right panel: Weighted average of religious fractionalization at country-level and share of women belonging to thegroup “other (faith)”
My main sample is made up of women born between 1955 and 1989. During this period, several
changes linked to the variables listed above took place. Figure 1.1 shows a visual representation
of the changes that I can observe in the data. Its left panel shows that education levels as well
as urbanization increased: these changes could lead to higher rates of intermarriages for later-
born cohorts than for earlier born cohorts.
Moreover, another key change took place: the decrease in the religious diversity of the pop-
ulation. The right panel of Figure 1.1 shows the decrease in the share of people identifying
with faiths other than Islam and Christianity and the associated decrease in religious diver-
1.4 Empirical strategy 19
sity. Under the assumption that people meet on a national marriage market and that no other
factors affected interfaith marriages, the decrease in religious diversity will mechanically re-
sult in lower interfaith marriage shares. Under the same assumptions, the share of interethnic
marriages should remain stable, as there is no sizable change in ethnic diversity levels10.
1.4 Empirical strategy
In this section, I present measures of ethnic and religious diversity and introduce the specifica-
tions used to measure time trends.
1.4.1 Descriptive statistics: Comparisons across countries and across identity cate-
gories
Cross-country comparisons must be done carefully in order to be meaningful: if intermarriage
shares are low in a given country, it could be that, according to the classification used, there is
not much diversity in this country to begin with. As discussed in section 1.2 in the case of Pente-
costal churches, the classification chosen in each survey cannot be considered as encompassing
the same reality across countries and identity categories. To neutralize this classification effect,
I compute the share of intermarriages that we would observe if individuals were matched ran-
domly in a national marriage market. This share measures how diverse each country is. It is
equivalent to computing an ethnic fractionalization index (EF) and a religious fractionalization
index (RF) for each (national) marriage market. More formally, this fractionalization measure
(Fc) corresponds to:
Fc = 1 −n∑
i=1pwi ∗ pmi
In this equation, c denotes countries. n is the number of ethnic (respectively religious) groups in
the survey. Subscripts w denote women, subscripts m denote men. pwi is the share of married
women who belong to the group i, pmi the share of married men who belong to the group
i. I assume that a person’s decision to marry does not depend on the ethnic and religious
identity of her/his matches. There are no additional entries on or exits from the marriage
market compared to what we observe in the data: polygamous men are thus counted as many
times as the number of their cohabiting wives.
Fractionalization indices do not include any information on how (dis)similar groups are11: they
10A more thorough discussion of ethnic diversity at the country-level and interethnic marriage patterns can befound in the online Appendix.
11Since the seminal paper by Greenberg (1956), adjusting measures of fractionalization by a measure of similarity
20 Interethnic and interfaith marriages in sub-Saharan Africa
are useful to identify changes at the extensive margin but do not allow for identification of
changes at the intensive margin.
In the case of ethnicity, I account for these differences by using linguistic distance measures
(detailed methodology in Appendix A-1.2.). Computing the country-specific random linguistic
distance (dc)is done according to the formula below:
dc =n∑
i=1
n∑j=1
pwi ∗ pmj ∗ dij
where dij is the linguistic distance between group i and group j, pwi the share of married
women who belong to the group i, and pmj the share of married men who belong to the group
j. The linguistic distance dii is set to 0. The linguistic distances dzi and diz , where z is the group
“other (ethnicity)”, are set to the median value of the linguistic distance between pairs of ethnic
groups listed in country c.
In the case of religious differences, there is no standard way to quantify the differences be-
tween groups. I study separately interfaith marriages – mixed marriages between Muslims,
Christians, and “other (faith)” – and Christian-Muslim marriages, the latter arguably being the
most distant kind of intermarriage along the religious dimension in sub-Saharan Africa.
1.4.2 Assessing time trends
I design a simple additive model to estimate trends on intermarriages. My baseline specifica-
tion is a linear probability model12 that I run on the pooled sample (including country fixed-
effects) and at the country level. I present below the specifications used at the country level. I
specify the model at the woman level13: individuals (w) are women. This decision is due to the
fact that 17% of women appear with at least one co-wife in the dataset: a woman appears only
one time in the dataset, but a man appears matched with all of his cohabiting wives.
between groups has been suggested to take into account the depth of cleavages: for instance, some ethnic groupsspeak the same language (e.g. in Zambia Posner (2005)) while others are very distant according to the linguistictree (e.g. the Somali and the Kikuyu, in Kenya). These differences are not taken into account by fractionalizationmeasures: the distance between all groups is set to be the same.
12The results do not change when using a logit model. Results available from the author.13The results do not change if I study intermarriage by collapsing the dataset to keep one observation per man.
In this case, I consider a man to be in an interethnic (interfaith) union if at least one of his wives is not a member ofhis ethnic (religious) group. Results available from the author.
Intermarriagew is an indicator variable that equals 100 if the union is interethnic (interfaith),
and 0 otherwise14. I consider unions to be interethnic (interfaith) if spouses do not belong to
the same group. When both spouses belong to the group “other,” I consider them to be in an
intraethnic (intrafaith) union. In Section 1.7, I test whether the results are robust to considering
these unions as interethnic (interfaith).
BirthY earw is a continuous variable defined as the year of birth of each woman. It is the main
variable of interest: if the coefficient associated to it is positive, it means that the share of in-
termarriages has increased over time. I use birth year rather than year of first cohabitation to
capture time trends for two reasons. First, birth year is available for all women, while using
cohabitation year would restrict the sample to women in their first union, as I only have in-
formation on the year of first cohabitation. Second, age at marriage or year of cohabitation is
endogenous to educational achievements and to the type of place of residence, while year of
birth is more exogenous to these individual characteristics. I compare birth year to cohabitation
year as a robustness check in Section 1.7. Agew is the age at survey date.
Age and birth year effects
Figure 1.2: Survey effect: Marital status and age at survey date
15
20
25
30
35
40
45
50
1955 1965 1975 1985Birth year
Age Maximum age Minimum age
0
10
20
30
40
50
60
70
80
90
100
1955 1965 1975 1985Birth year
Share married women Share remarried women
Left panel: The sample includes only women in union at the time of the survey. 95% confidence intervals.The “maximum age” is the age of the oldest woman surveyed for each birth year. The “minimum age”is the age of the youngest woman surveyed for each birth year. These ages depend on the timing ofsurveys within each country. Women aged 15 to 49 are surveyed in DHS, hence the flat lines at thesetwo points.Right panel: These shares are computed using all women surveyed. 95% confidence intervals.“Married women” are the women in union at the time of the survey, not the ever-married women.
14The indicator variable is set to 100 so that coefficients can be read as changes in percentage points.
22 Interethnic and interfaith marriages in sub-Saharan Africa
I add quadratic controls for age in the model to control for age effects. This ensures that the
patterns that I identify in the data are due to change across cohorts and not to age effects15. I
use surveys implemented from 1992 to 2018: women born in 1955 were older than 35 in the
first DHS survey of each country; women born in 1985 were younger than 35. As shown in
Figure 1.2, the timing of survey waves can be seen in the age composition of cohorts: earlier-
born cohorts are older at survey date than later-born cohorts. Whether a woman is married or
not is a function of age: differences in age composition of cohorts (left panel) are mirrored by
differences in the share of married and remarried women by cohort (right panel). As women in
earlier-born cohorts are older at the survey date, they are more likely to have married and more
likely to have remarried, either after a divorce or being widowed. The same characteristics
are likely to drive both the type of marital status that I observe (married/remarried/never
married) and the type of marital outcome that I observe (intermarried or not). For instance,
if women who marry young are more likely to marry within their group, then, without age
controls, I would estimate time trends that are due to the fact that cohorts differ with respect to
I introduce additional variables in the model to test whether they explain changes in inter-
marriage shares. I assume these variables have a constant effect over time. I add dummies
for the highest education level: Primaryw and Secondaryw, the reference category being “no
education”16. Urbanw is a dummy that takes the value 1 if the respondent lives in an urban
area. Moreover, to control further for cohort composition effects, I add a dummy variable,
Remarriedw, which takes the value 1 if the respondent has remarried. I discuss alternative
ways to measure the impact of remarriage in section 1.7.
Throughout the paper, I compare the coefficient associated with birth year between the spec-
ification 1.1 and the specification 1.2. The birth year coefficient in specification 1.1 measures
time trends. The birth year coefficient in specification 1.2 measures time trends that cannot be
15This is possible as my main sample is made of up of countries for which I have at least two survey waves. Ihence observe birth cohorts at different ages. Thus, 82.7% of women belong to birth cohorts that were sampled atleast twice in their country. Given the quadratic function that I use to estimate age effects, I do not need all of thecohorts to have been sampled twice to estimate age effects and birth years effects separately.
16DHS classification distinguishes between secondary and higher education. Only 2.3% of married women in mysample completed university, so I aggregate secondary education and higher education into a single category.
1.5 Results on pooled sample 23
explained by changes in education levels, in urbanization, and in cohort composition due to
remarriage. As such, it could capture changes in preferences and in social norms. However, it
should be noted that that several other variables may contribute to individuals’ likelihood to
intermarry, such as parental education, whether one’s parents intermarried, or whether both
parents are still alive at the time of the marriage decision: the coefficient in specification 1.2
may also capture some of these omitted factors.
1.5 Results on pooled sample
This section presents the results on the pooled sample. The country-specific results are detailed
in section 1.6.
1.5.1 Descriptive statistics
Table 1.1: Average intermarriage shares and linguistic distance
Interethnic marriages (%) Linguistic distance (nodes)Observed Random Ratio N Observed Random Ratio N
20.4 80.0 25.5 97111 3.29 3.25 1.01 21704
Interfaith marriages (%) Muslim/Christian marriages (%)Observed Random Ratio Na Observed Random Ratio N
9.7 33.8 28.7 96549 2.4 21.7 11.2 83291Data & sample: Women in union, pooled sample. Weighted data.Interethnic, interfaith, and Muslim-Christian marriage shares: The observed share corresponds to the share observedin the population. The random share corresponds to the share that we would observe if people currently inunion had matched at random, under two assumptions. First, there is no exit or entry into the marriagemarket compared to what we observe. Second, polygamy decisions are independent from women’s ethnicityand religion: Polygamous men appear on the random market the same number of time as in the observedmarket. Random shares are computed for each country, considering a national marriage market. The randomshare for the pooled sample is the weighted average of those national random shares. The ratio is computedas the ratio of observed share to random share.Linguistic distance: The random and observed linguistic distances are computed considering a national mar-riage market, using information on interethnically married couples. Linguistic distance between two spouseswhen only one of them belongs to the group “other (ethnicity)” is set to the country-specific median linguisticdistance, computed on distances for all pairs of ethnolinguistic groups.Muslim-Christian marriage shares (observed and random) are computed using only marriages in which neitherspouse is member of the group “other (faith)”.
a One survey wave in Senegal does not include a question on religion.
Table 1.1 displays the estimations of observed intermarriage shares and contrasts them with
the intermarriage shares that we would have observed under random matching. Interethnic
unions are on average more frequent than interfaith unions: 20.4% of women are married to a
man who is not from the same ethnic group as them, and 9.7% of women are married to a man
who is not a member of the same religious group as them. However, the number of categories
and the level of diversity differ depending on whether we consider ethnicity or faith: under
random matching, we would observe around 80% of interethnic marriages and around 33.8%
of interfaith marriages. When we look at the ratio of the observed share of intermarriages to
24 Interethnic and interfaith marriages in sub-Saharan Africa
the random share of intermarriages, interfaith marriages and interethnic marriages are roughly
as common: between 25% and 30% of the random share of intermarriages is realized.
I find that interethnic unions take place at a linguistic distance that is similar to what we would
observe under random matching. In contrast with the fact that 28% of interfaith unions are
realized, Muslim-Christian marriages are rare17: they make up 2.4% of marriages when con-
sidering only Muslim and Christian respondents, and 2.1% otherwise. It is 11.2% of what we
would observe under random matching. Most interfaith unions hence involve a spouse who
identifies as Muslim or Christian and a spouse who belongs to the group “other (faith)”. Indi-
viduals who are neither Muslim nor Christian are more likely to be in an interfaith union than
Muslims and Christians. Couples that include at least one follower of “other (faith)” make up
14% of the sample, but 79% of interfaith couples, most of them taking place between a Christian
spouse and an “other (faith)” spouse. The high propensity of “other (faith)” members to inter-
marry is consistent with the fact that traditional religions are more tolerant of intermarriages.
It is also likely that the conversion process from a traditional religion to Islam or to a Christian
denomination might not concern both spouses at the same time.
1.5.2 Time trends
Assessing time trends
Figure 1.3 shows the shares of each type of intermarriage over birth cohort of women, thus
providing visual evidence on the magnitude of changes.
1766.5% of Muslim-Christian unions are unions between a Muslim man and a Christian woman. Most Islamicscholars consider that it is forbidden for Muslim women to marry non-Muslim men, but Muslim men can marrywomen who belong to other monotheist religions. Hence, this imbalance in the types of Muslim-Christian unionsindicates that such unions, while rare, are not only counted due to measurement errors.
Observationsa 97111 97111 96549 96549 96549 96549R-squared 0.247 0.272 0.134 0.139 0.029 0.033Mean dependent variable 20.4 9.7 2.1Data & sample: Women in union, pooled sample. Weighted data.Specification: OLS regression. Standard errors are clustered at the DHS-cluster level. As the specification includes country-fixed effects, thereis no constant in the model.Columns (1) and (2): the dependent variable is a variable that equals 0 if the union is an intraethnic one, 100 if it is an interethnic one.Columns (3) and (4): the dependent variable is a variable that equals 0 if the union is an intrafaith one, 100 if it is an interfaith one. Three faithgroups are defined: “Muslim”, “Christian”, and “other (faith)”.Columns (5) and (6): the dependent variable is a variable that equals 100 if one spouse is Christian and the other one Muslim, and 0 otherwise.Results for all columns should be read as changes in percentage points.Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
a One survey wave in Senegal does not include a question on religion.
umn (4)), the magnitude of the increase in interfaith marriage shares is 3.75 percentage points:
these variables explain little of the trend observed, corroborating the idea that the decrease in
interfaith marriages is mainly due to declining levels of religious diversity.
Assessing individual characteristics
Looking at interethnic marriages and at Muslim-Christian marriages, the coefficients on educa-
tion and on urban residence are consistent with what we would expect from variables capturing
parts of the individual preferences factor and of the diversity level factor. Completion of primary
school rather than having no education is associated with a higher likelihood of intermarrying.
Completion of secondary school is associated with an even higher likelihood. Urban residence
is also associated with an increase in the likelihood of intermarriage 18. In the case of interfaith
marriages, secondary education and urban residence are negatively correlated to the likelihood
of being in an interfaith marriage, but these variables capture the likelihood of belonging to the
18Using a sample of earlier-born cohorts for which I have information on childhood place of residence, I run thespecifications from Table 1.2. The coefficient estimated for the current place of residence is slightly higher than thecoefficient for the childhood place of residence.
1.6 Results at country-level 27
group “other (faith)”. Members of this group are often followers of traditional religions, and
attendance in school and urban residence are negatively correlated with the likelihood of being
a member of this group19.
Remarried women are more likely to be married outside of their group than women who are
still in their first union, whatever the kind of intermarriage considered. Social norms may
be different for women who marry for the first time and for women who remarry, as women
have more freedom in choosing a spouse when they have already been married (Locoh and
Thiriat, 1995). Similarly, earlier-born women may remarry under the same set of (more ac-
cepting) norms as later-born women who enter their first union. Last, women who remarried
may have different (unobserved) characteristics that also lead them to marry outside of their
group, whether in their first union or in the subsequent ones. I discuss these hypotheses about
remarriage in section 1.7.
1.6 Results at country-level
This section presents the descriptive statistics and results on time trends at the country level.
For brevity, tables include only the coefficient associated with the variableBirthY ear and show
results for the two main specifications (with and without controls). The full results at the coun-
try level and the results with control variables introduced one by one are available in the online
Appendix.
1976.6% of women who belong to the group “other (faith)” did not complete primary school while 44.4% ofMuslim and Christian women did not complete primary school. 12.8% of women who belong to the group “other(faith)” live in an urban area while 28.5% of Muslim and Christian women live in an urban area.
28 Interethnic and interfaith marriages in sub-Saharan Africa
Data: Survey wave (DHS) conducted the closest to 2005. Sample: Women currently in union.Left panel: Share of interethnic marriages. Right panel: Ratio of observed share to random share of interethnicmarriages. Higher ratios mean that the share of interethnic marriages is closer to what would be observed underrandom matching.The corresponding data can be found in the online Appendix.
The maps in figure 1.4 show the observed share of interethnic marriages and the ratio of the
observed to random share of interethnic marriages. Striking differences between countries ap-
pear. In Congo-Brazzaville and in Zambia, more than 40% of married women are in an intereth-
nic marriage, whereas this share is lower than 10% in DRC, Kenya, Namibia, and Nigeria. The
observed share and the ratio of observed to random shares are similar. This is because coun-
tries have high random shares of interethnic marriages: there are only two countries where this
Data: Survey wave (DHS) conducted the closest to 2005. Sample: Women currently in union.Left panel: Share of interfaith marriages. Right panel: Ratio of observed share to random share of interfaith mar-riages. Higher ratios mean that the share of interfaith marriages is closer to what would be observed under randommatching.The corresponding data can be found in the online Appendix.
The maps in figure 1.5 show the observed share of interfaith marriages and the ratio of the
observed to random share of interfaith marriages. In stark contrast to interethnic marriage pat-
terns, the share of interfaith marriages is low. The highest share of interfaith marriages is 29.6%
(Congo-Brazzaville), while the highest share of interethnic marriages is over 40%. However,
the level of religious fractionalization is much lower than the level of ethnic fractionalization,
hence the much darker shades of the map on the right panel. Countries are also more het-
erogeneous with respect religious fractionalization, which ranges from 3.6% (Niger) to 64.1%
(Benin).
On the pooled sample, ratios of observed to random shares are similar for interethnic and in-
terfaith marriages shares, but it is not the case when looking at countries separately. Notably,
the distribution of this ratio is wider when looking at interfaith marriages rather than at in-
terethnic marriages: there is no country for which this ratio is higher than 60% when looking at
interethnic marriages, but it is higher than 60% for interfaith marriages in Congo-Brazzaville,
Gabon, Namibia, Niger, and Zambia.
30 Interethnic and interfaith marriages in sub-Saharan Africa
1.6.2 Time trends on interethnic marriages
Extensive margin
Time trends
Figure 1.6: Observed interethnic marriage shares over birth cohorts
0
5
10
15
20
25
30
35
40
45
50
1955
-59
1960
-64
1965
-69
1970
-74
1975
-79
1980
-84
1985
-89
Birth cohort
MaliUgandaSenegalGhanaCIGuineaTogoBeninKenya
0
5
10
15
20
25
30
35
40
45
50
1955
-59
1960
-64
1965
-69
1970
-74
1975
-79
1980
-84
1985
-89
Birth cohort
ZambiaGabonMalawiCameroonNigerBF
Sample & data: Women currently in union, weighted DHS data at country level.Panel A (left): Countries for which the trend on interethnic marriages is significantly different from 0.Panel B (right): Countries for which the trend on interethnic marriages is not significantly different from 0.Countries are sorted into these two panels according to regression results from Table 1.3. Countries appear in thelegend in descending order with respect to the share of interethnic marriages in the 1985-1889 cohort.BF: Burkina Faso; CI: Cote d’Ivoire.
Figure 1.6 presents a visual representation of the changes in the share of interethnic marriages
over time. Panel A shows that trajectories of countries where interethnic marriages became
more frequent look similar. When looking at panel B, we notice that out of six countries where
interethnic marriage shares did not increase, three – Zambia, Gabon, Malawi – already had
shares of interethnic marriages higher than 25%. Two countries, Burkina Faso and Niger, are
the only countries in the sample where the ethnic fractionalization index is lower than 70%.
They both have huge majority groups – the Mossi in Burkina Faso, the Hausa in Niger – which
may mean that the context in which unions take place in these two countries is different from
what happens in countries where there is no majority group in the demographic sense. The
exception is Cameroon: it has a positive but not significant increase in interethnic marriage
shares20, while having the same share of interethnic marriages as the average on the pooled
sample, and having no majority group.
Turning to regression analysis, Table 1.3 lists the coefficient associated with birth year for two
sets of regressions, without and with the following controls: education, urban place of resi-
dence, and remarriage. The share of interethnic marriages significantly increased over time in
Benin, Cote d’Ivoire, Ghana, Guinea, Kenya, Mali, Senegal, Togo, and Uganda (Panel A). In
20While there seems to be a trend for Cameroon on Figure 1.6, the trend is insignificant when age controls areadded.
1.6.2 Time trends on interethnic marriages 31
Table 1.3: Trend - Observed interethnic marriage shares
(1) (2) (3) (4)Dependent variable Interethnic marriage Mean N
Birth year coefficient Each cell: coefficient from a separate regressionPanel A: Increase in interethnic marriage sharesBenin 0.242*** 0.176** 15.1 10977
(0.135) (0.123)Panel B: No change in interethnic marriage sharesBurkina Faso 0.00631 -0.0680 10.4 9170
(0.0856) (0.0849)Cameroon 0.545 0.215 20.5 3066
(0.332) (0.317)Gabon 0.364 0.410 38.0 2274
(0.289) (0.278)Malawi 0.00730 -0.156 31.8 9241
(0.121) (0.120)Niger -0.154 -0.114 12.7 5603
(0.120) (0.123)Zambia 0.0675 0.0821 46.0 10711
(0.132) (0.125)
ControlsAge & Age2 X XEducation XUrban XRemarried X
Sample & data: Women currently in union, weighted DHS data at country level. Specification: OLSregressions run separately for the 15 countries of the sample. Standard errors are clustered at theDHS-cluster level. The dependent variable is a variable that equals 0 is the union is intraethnic,100 if the union is interethnic.Columns (1) and (2) report the coefficient associated to the birth year variable. Each cell corre-sponds to a separate regression. Column (3) reports the mean number of interethnic marriages inthe regression sample. Column (4) reports the number of observations for each country.Results in columns (1) and (2) can be interpreted as changes in percentage points.
Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
32 Interethnic and interfaith marriages in sub-Saharan Africa
terms of magnitudes, estimates of a 0.3 increase in percentage points by birth year translate
into an increase of 7.5 percentage points when extrapolating over 25 years. Once I control for
the individual characteristics correlated with interethnic marriages, the trends remain positive
and significant in six countries out of nine: Benin, Cote d’Ivoire, Guinea, Mali, Senegal, and
Uganda. In Ghana, the coefficient turns insignificant when introducing either education or ur-
ban residence to the model. In Kenya, the coefficient drops when introducing the type of place
of residence to the model but only loses significance when all of the variables are introduced
jointly. In the case of Togo, adding education levels to the model explains away the trend.
Looking at Panel B, the introduction of control variables does not change the results.
Individual characteristics
Countries are strikingly similar with respect to correlates of interethnic marriages, and there
are no differences between countries where interethnic marriage shares increased and countries
where they did not. The results are consistent with what is found in the pooled sample: primary
education, secondary education, urban residence, and remarriage are all positively correlated
to the likelihood of being in an interethnic union. The two exceptions are Uganda, where
women who attended primary school are less likely than their uneducated counterparts to
marry outside of their ethnic group – when urban residence is controlled for –, and Gabon,
where urban residence is uncorrelated to interethnic marriage. The share of Gabonese women
living in an urban areas is above 80%, while the share of women living in urban areas is lower
than 45% in all of the other countries: urbanization might stop being a mixing factor once
urbanization levels are high.
Intensive margin: Linguistic distance
Descriptive statistics and time trends
Table 1.4 shows results of the regression of the linguistic distance (conditional on being in an
interethnic union) on birth year, and on both sets of controls. Changes at the extensive margin
do not necessarily correspond to changes at the intensive margin. The linguistic distance of
interethnic marriages increased in three countries, two countries where interethnic marriages
became more frequent – Benin and Togo – and one – Cameroon – where interethnic marriages
did not increase. The linguistic distance decreased in Cote d’Ivoire, Kenya, and Senegal, all
countries where the share of interethnic marriage increased. The linguistic distance did not
change in the nine other countries of the sample. Introducing individual characteristics in
the model changes the results only in Uganda, where the trend turns negative. The ratio of the
1.6.2 Time trends on interethnic marriages 33
Table 1.4: Trend - Linguistic distance between spouses
(1) (2) (3) (4) (5)Dependent variable Linguistic distance Mean Ratio N
Birth year coefficient Each cell: coefficient from a separate regressionPanel A: Increase in interethnic marriage shares
Increase in linguistic distanceBenin 0.0151** 0.0145** 3.4 0.8 1650
ControlsAge & Age2 X XEducation XUrban XRemarried X
Sample & data: Women currently in an interethnic union, weighted DHS data atcountry level. Specification: OLS regression run separately for the 15 countriesof the sample. Standard errors are clustered at the DHS-cluster level. Depen-dent variable is the linguistic distance (measure defined in Appendix A-1.2.)associated to each interethnic union.Columns (1) and (2) report the coefficient associated to the birth year variable.Each cell corresponds to a separate regression. Column (3) reports the meanlinguistic distance for intermarried couples. Column (4) reports the ratio of themean observed linguistic distance to the random linguistic distance, computedby randomly matching individuals who married outside of their ethnic group.Column (5) reports the number of observations for each country.Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
observed to random linguistic distance (column (4)) is close to one for all countries: conditional
to being in an interethnic union, most of the linguistic distance is realized. Moreover, in seven
countries of the sample, this ratio is larger than one, indicating that interethnic marriages are
more distant than they would be if they were formed at random (considering only intermarried
people). There is only one country, Senegal, where this ratio is lower than one and where the
linguistic distance of interethnic marriages decreased.
34 Interethnic and interfaith marriages in sub-Saharan Africa
The evolution of linguistic distances depends on the type of interethnic marriages that are
observed in the earlier-born cohorts. For instance, comparing the cases of Benin and Kenya
(linguistic trees for these two countries are depicted in Appendix A-1.2.), the average linguis-
tic distance of interethnic marriages decreased in Kenya and increased in Benin. In Benin,
as interethnic marriages became more frequent, all groups started intermarrying more, thus
resulting in a decrease of the share of Adja-Fon unions, whose distance is one node, and an in-
crease in the share of unions with a distance larger than four nodes (e.g. Yoruba-Peulh unions).
Increasing linguistic distances indicate that women in later-born cohorts marry further away
from their group, and hence that some ethnic cleavages may have lost salience. In Kenya, the
decrease in linguistic distance stems mostly from the fact that in earlier-born cohorts there are
more interethnic couples in which at least one spouse is Kalenjin or Luo – the only two groups
that belong to the Nilo-Saharan branch – than in later-born cohorts21. As interethnic marriages
became more common, the share of such unions among intermarried people decreased, result-
ing in the decrease of the average linguistic distance of interethnic unions. This result is still
consistent with the fact that some ethnic barriers – not captured by linguistic distances – are
becoming less salient.
Individual characteristics
Correlates of the linguistic distance of interethnic marriages are not the same as correlates of
interethnic marriages: primary and secondary education are negatively correlated to the lin-
guistic distance of marriage in most countries. Such a reversal between the extensive and the
intensive margin can be explained by ‘over-selection’ of individuals. Higher education levels
are correlated with a higher likelihood of marrying outside of one’s ethnic group, so individu-
als who marry outside of their ethnic group and who have also not attended school are likely
to have unobserved characteristics, such as being strong-willed, that also make them marry
further away from their group or marry without any consideration of group differences.
1.6.3 Time trends on interfaith and Muslim-Christian marriages 35
Figure 1.7: Observed interfaith marriage shares over birth cohorts
0
5
10
15
20
25
30
35
40
45
50
1955
-59
1960
-64
1965
-69
1970
-74
1975
-79
1980
-84
1985
-89
Birth cohort
GabonTogoGhanaBeninBFKenyaZambia
0
5
10
15
20
25
30
35
40
45
50
1955
-59
1960
-64
1965
-69
1970
-74
1975
-79
1980
-84
1985
-89
Birth cohort
CICameroonMalawiMaliUgandaGuineaNigerSenegal
Sample & data: Women currently in union, weighted DHS data at country level.Panels A1 and B1 (left): Countries for which the trend on interfaith marriages is negative and significantly differentfrom 0.Panels A2, A3 and B2 (right): Countries for which the trend on interfaith marriages is not significantly different from0; Cameroon (A3) is the only country for which the share of interfaith marriages increased.Countries are sorted into two panels according to the regression results from Table 1.5. Countries appear in thelegend in descending order with respect to the share of interfaith marriages in the 1985-1889 cohort.BF: Burkina Faso; CI: Cote d’Ivoire.
1.6.3 Time trends on interfaith and Muslim-Christian marriages
Time trends
Figure 1.7 presents a visual representation of the change in interfaith marriage shares over
time. Comparing the two panels, it appears that countries seem to converge towards low levels
of interfaith marriages. Apart from Cameroon and Cote d’Ivoire, all of the countries where
interfaith marriages did not become less frequent are countries where the share of interfaith
marriages is lower than 10% for all of the cohorts.
Table 1.5 shows the coefficient associated with birth year for country-specific regressions of the
likelihood of being in an interfaith union on birth year and on the two sets of additional vari-
ables. Without individual controls other than age, the share of interfaith marriages increased
only in Cameroon. The share of interfaith marriages decreased in Benin, Burkina Faso, Gabon,
Ghana, Kenya, Togo, and Zambia. Controlling for education levels, urban place of residence,
and remarriage explains the trend in Benin, Gabon, and Togo. In Benin, the trend turns insignif-
icant when introducing an indicator variable for remarried status. In Togo, the introduction of
education variables as well as of urban residence explains the trend. In Gabon, it is the joint
effect of the three variables.
21Linguistic distances does not capture perfectly the cleavages between groups and, contrary to share of inter-marriages, is sensitive to extreme values. For instance, in Kenya, the distance between, on the one hand, the Luoand Kalenjin groups (Nilo-Saharan branch) and, on the other hand, all other ethnic groups but the Somali is high(over 7.5 nodes). Interethnic unions in Kenya have an average linguistic distance of 7.2 nodes if at least one of thespouses is Luo or Kalenjin, and a distance of 3.3 nodes otherwise. Luo-Kalenjin unions themselves make up 1.6%of interethnic unions in Kenya, despite the fact the linguistic distance of the pair is lower than with other groups,indicating that other factors than linguistic distance are also at play.
36 Interethnic and interfaith marriages in sub-Saharan Africa
Table 1.5: Trend - Observed interfaith marriage shares
(1) (2) (3) (4) (5) (6) (7)Dependent variable Interfaith marriage Interfaith marriage
Muslim/Christian/Others Mean Muslim-Christian Mean N
Birth year Each cell: coefficient from a separate regressioncoefficientPanel A: Increase in interethnic marriage shares
Panel A1: Decrease in interfaith marriage sharesBenin -0.150** -0.103 16.7 0.0430 0.0396 2.9 10977
ControlsAge & Age2 X X X XEducation X XUrban X XRemarried X X
Sample & data: Women currently in union, weighted DHS data at country level. Specification: OLS regres-sion run separately for the 15 countries of the sample. Standard errors are clustered at the DHS-clusterlevel.Columns (1) to (2): Dependent variable is a variable that equals 0 is the union is intrafaith, 100 if theunion is interfaith, considering three religious groups (Christians, Muslims, Other (faiths)). Columns (1)and (2) report the coefficient associated to the birth year variable. Each cell corresponds to a separateregression. Column (3) reports the mean number of interfaith marriages.Columns (4) to (5): Dependent variable is a variable that equals 0 is the union is a Muslim-Christianunion, 100 if the union is not. Columns (4) and (5) report the coefficient associated to the birth yearvariable. Each cell corresponds to a separate regression. Column (6) reports the mean number of Muslim-Christian marriages. Column (7) reports the number of observations for each country.Results in columns (1), (2), (4) and (5) can be interpreted as changes in percentage points.Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
Muslim-Christian union shares only changed in three countries, decreasing in Ghana and in-
creasing (although to a small extent) in Togo and Uganda. The coefficients are not significant
in all of the other countries, a finding that is consistent with the fact that there is no trend when
looking at the pooled sample (Table 1.2).
1.7 Robustness analysis 37
Crossing these two sets results with results on variation of the share of “other (faith)” over time
(the full results can be found in the online Appendix), I find that the share of “other (faith)”
decreased in all of the countries where the share of interfaith marriages decreased, with the
exception of Zambia. Such share increased only in Senegal and Niger, where there was no
decrease in the share of interfaith marriages. In keeping with results on the pooled sample,
decreasing interfaith marriage shares are likely to be driven by the decline in the share of the
group “other (faith)” and the resulting decrease in the level of religious diversity. However, at
the country-level, education and urbanization suffice to explain the trend in Benin, Gabon, and
Togo. In Ghana, where both interfaith marriages and Muslim-Christian marriages became less
common, it is likely that social norms or preferences are moving away from tolerating interfaith
marriages.
Individual characteristics
Countries are heterogeneous with respect to correlates of interfaith marriages, as education
levels and urban residence also capture the likelihood of being a member of “other (faith)” in
most countries. Consistent with results on the pooled sample, urban residence and secondary
education are negatively correlated to the likelihood of being in an interfaith union in most
countries, and signs on primary schooling differ across countries. Contrasting with these re-
sults, education levels and urban residence are either insignificant or positively correlated with
the likelihood of being in a Muslim-Christian marriage, thus mirroring results on individual
characteristics associated with marrying outside of one’s ethnic group.
1.7 Robustness analysis
I implement four robustness checks on my findings. First, I relax the assumption that a mar-
riage is intraethnic or intrafaith when both spouses belong to the group “other”. Second, I test
whether the results are robust to alternative assumptions on remarried women’s first unions
and whether the trends are also found when considering separately women in their first union
and remarried women. Third, using only women in their first union, I test whether “assim-
ilation” and conversion take place over the length of a marriage. Fourth, using only women
in their first union, I compare time trends measured using birth year and using cohabitation
year. Table 1.6 displays results from the main specification and from the regressions when vary-
ing the assumptions as mentioned above. For brevity, the results are presented for the pooled
sample. A full discussion of the country-level results can be found in the online Appendix.
38 Interethnic and interfaith marriages in sub-Saharan Africa
1.7.1 Testing for heterogeneity in the “other” group
The categories “other ethnicity” and “other faith” are categories that are more heterogeneous
than other categories. In the main specification, I assume that when both spouses belong to
the group “other”, their union is in-group. Assuming that these unions are in fact out-group
unions (“other-other” assumption), more unions appear as intermarriages. Figure 1.8 shows the
comparison of intermarriage shares using the main assumption and using the “other-other”
assumption: more unions are now counted as interethnic and as interfaith, thus providing an
upper bound on the share of such unions.
In table 1.6, columns (1) and (2) are similar to columns (3) and (4): the results regarding the
pooled sample are robust to counting the “other”-“other” unions as inter-group unions. The
absolute magnitude of the coefficient is higher under the “other-other” assumption, especially for
interfaith marriages, which is consistent with the overall decrease in the share of “other (faith)”.
At the country level, the main results carry through. There is no country for which the share
of interethnic marriages decreased. The trend on interfaith marriages in Niger turns positive
and significant, which is consistent with the fact that the share of “other (faith)” increased over
time in this country.
Figure 1.8: Shares of interethnic/interfaith marriages over birth cohort
0
10
20
30
40
1955
-59
1960
-64
1965
-69
1970
-74
1975
-79
1980
-84
1985
-89
Main Bound otherBound high Bound low
0
10
20
30
40
1955
-59
1960
-64
1965
-69
1970
-74
1975
-79
1980
-84
1985
-89
Main Bound otherBound high Bound low
Sample & data: Women in union, pooled sample. 95% confidence intervals included.Left panel: Interethnic marriages. Right panel: Interfaith marriages.Bound high: Shares under the assumption that all remarried women were in interethnic/interfaith first unions. Boundlow: Shares under the assumption that all remarried women were in intraethnic/intrafaith first unions. Boundother: Shares under the assumption that when both spouses belong to the group “other”, they are in an intereth-nic/interfaith union.
1.7.2 Testing the remarriage story
Women remarry either after a divorce or after being widowed. As such, remarriage is a function
of age – older women are more likely to be widows – and may also be a function of the charac-
1.7.2 Testing the remarriage story 39
teristics of a woman’s first union – intermarriages may be more likely to end by a divorce22.The
age controls do not capture the fact that some unions may be more likely to end than others:
cohort composition effects might bias the estimated time trends. Remarried women are more
likely to be in an interethnic (interfaith) marriage than women in their first marriage, but it
might be that they have already been in a (first) marriage which was interethnic (interfaith). To
understand better how remarriage patterns affect my results, I study bounds on the estimates
and use a sub-sample analysis.
First, I bound my estimates by making assumptions on first unions of remarried women. Fig-
ure 1.8 depicts how the different assumptions affect intermarriage shares. The “higher bound”
assumption assigns an interethnic union to all of the women who have remarried. The “lower
bound” assumption assigns an intraethnic union to all of the women who have remarried. Three
points must be noted on these assumptions. First, they are extreme assumptions: either 100%
or 0% of remarried women are assumed to have had an interethnic (interfaith) first marriage
while 25.6% of them are in an interethnic marriage at survey date and 13.1% in an interfaith
one. Second, the “higher bound” assumption on interethnic marriages is an extreme assumption
as the ratio of the share of remarried women who have married outside of their group to the
share of not-remarried women who have married outside of their group changes over birth
cohorts.Third, there is a trend on remarriage. When regressing the Remarried variable on the
year of birth, and quadratic age controls, there is a negative and significant trend: the share of
remarried women is higher in earlier-born cohorts than in later-born cohorts, even when age
is controlled for23. It means that the “higher bound” assumption works against finding a positive
trend and that the “lower bound” assumption works in favor of finding a positive trend.
Table 1.6 (columns (5) to (8)) shows the coefficient on birth year under these two assumptions.
Under the “lower bound” assumption, the trend on interethnic marriages remains positive and
significant. However, under the “higher bound” assumption, there is no trend on interethnic (first)
marriages, and a significant negative trend (of small magnitude) once education and urban res-
idence are controlled for. At the country-level, the time trends turn negative and significant in
Benin, Senegal, and Togo, resulting on the insignificant trend on the pooled sample. As the
trend on interethnic marriages is barely negative and significant under the “higher bound” as-
sumption that works in favor of finding a negative trend, it is extremely unlikely that interethnic
22Whether spouses belong to the same group might affect the likelihood of divorce. However, not all remarriedwomen have divorced, some were widowed. Interethnic unions are associated to lower age gaps between spouses,to a higher likelihood to live in an urban area, and to higher education levels: it is likely that intraethnic marriagesare more likely to be ended by the husband’s death than interethnic marriages are.
23Two factors are likely to explain this trend on the share of remarried women. First, widowhood being a lesscommon experience due to changes in life expectancy. Second, not remarrying may be an option that is accessibleto a higher share of later-born women than to their earlier-born counterparts.
40 Interethnic and interfaith marriages in sub-Saharan Africa
first marriages became less frequent over time, and unlikely that their share remained constant.
The trend on interfaith marriages remains negative and significant under both assumptions.
Nonetheless, the magnitude of the coefficient drops under the “lower bound” assumption, as
expected. The results on interfaith marriages at the country-level are robust to these bounds.
Second, I test whether the trends that I observe come from remarried women or from women
in their first union (columns (9) to (11), and columns (13) and (14), Table 1.6). There are trends
in all of the sub-samples. Such trends are not found for all of the countries, a result that may
stem from differences between countries and sub-samples, or from the fact that the sample size
is too small in the remarried sub-sample. I find no sub-sample in which the trend on interethnic
marriages is negative and significant. Regarding interfaith marriages, the trend is positive and
significant only for remarried Nigerien women and for Cameroonian women.
Even if we believe that the results from the “higher bound” assumption on interethnic first mar-
riages are correct and hence that the share of interethnic first marriages did not change over
time, the results from the sub-sample analysis show that there is a positive and significant
trend when considering remarriages, even when education and urban residence are controlled
for. These results are consistent with the hypothesis that women who remarry have more of a
say on whom they marry, and that social norms around interethnic marriages have relaxed over
time. These results may also capture changes in the composition of the remarried sub-sample:
among remarried women, the share of widows is likely to be higher in earlier-born cohorts
than in later-born cohorts, the share of divorcees is likely to be lower in earlier-born cohorts
than in later-born cohorts. Divorced women might have different preferences from widows,
and they may also be more likely to choose to whom they remarry.
1.7.3 Testing the “assimilation”/conversion story
Older women have spent more time in a union than younger women: as spouses spend a longer
time in a union, their ethnic or religious identity may change. I cannot test for explanations
about conversion or “assimilation” that take place before cohabitation or marriage, but I can
test these two channels during the time in union. Exploiting the fact that there are at least
two survey waves for each country, I can study whether women who married for the first time
the same year and were born the same year are more (less) likely to report having the same
ethnic (religious) group as their husband when the length of union increases. However, the
identification ultimately rests on differences across survey waves, so this also captures any
effect linked to survey waves.
1.7.3 Testing the “assimilation”/conversion story 41
Table 1.6: Robustness checks on interethnic marriages and interfaith marriages – Pooledsample
Regression results - Dependent variable: Interethnic marriage - Each cell: birth year coefficient from a separate regression
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14)Sample All married women First union RemarriedAssumptions Main Main Bound Other Higher bound Lower bound Main Main Main Main Main Main
Country fixed-effects X X X X X X X X X X X X X XAge & Age2 X X X X X X X X X X X X XEducation & Urban X X X X X X XRemarried X
Data & sample: Women in union, pooled sample. Weighted data.Specification: OLS regression. Standard errors are clustered at the DHS-cluster level. As the specification includes country-fixed effects, there is no constant in the model. Dependent variable is a variable thatequals 0 is the union is intraethnic (intrafaith), 100 if the union is interethnic (interfaith). Definition of the dependent variable varies in specifications (1) to (8). Results are estimated under the main specification,but on two sub-samples in columns (9) to (14).Columns (1) to (10) and (13) and (14) report the coefficient associated to the birth year variable.Column (11) reports the coefficient associated to the birth year variable, as well as the number of years since first cohabitation and the age at first cohabitation.Column (12) reports the coefficient associated to the year of first cohabitation variable.(1), (2): Main specification, All women: Dependent variable: Interethnic (interfaith) marriages as observed in the data.(3), (4): Other bound, All women: Dependent variable: Interethnic (interfaith) marriages, with “other”-“other” unions counted as interethnic ones.(5), (6) : Higher bound, All women: Dependent variable: Interethnic (interfaith) marriages, with all women who remarried counted as being in an interethnic (intrafaith) union.(7), (8): Lower bound, All women: Dependent variable: Interethnic (interfaith) marriages, with all women who remarried counted as being in an intraethnic (intrafaith) union.(9), (10), (11), (12): Main specification, First unions: Only women in their first union.(13), (14): Main specification, Remarried: Only women who have remarried.Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
Table 1.6 (column (11)) shows results from the regression of the Intermarriage variables on
birth year, length of union, and age at first union. Comparing column (9) to column (11), I find
that the time trends are robust to controlling for length of union and age at first union.
The coefficient of the number of years since cohabitation is not significant for interethnic mar-
riages but is negative and significant for interfaith marriages. Therefore, in the case of intereth-
nic marriages, “assimilation” and selective divorces seem unlikely. In the case of interfaith
marriages, the longer the union, the more likely it is that spouses have the same faith. This re-
sult is consistent with the fact that the share of “other (faith)” decreased over time: conversion
during marriage may be one of the mechanisms that is behind this decrease. Another hypoth-
esis would be that interfaith unions are more likely to break than intrafaith unions, but this
hypothesis does not account for the decline in traditional religions.
Concerning the age at first cohabitation, women who were older when they started cohabiting
are more likely to be in an interethnic union, which is consistent with the fact that these women
are more educated and more likely to live in an urban area than their counterparts, and that
these characteristics are positively correlated to the likelihood of being in an interethnic union.
Older women at the time of their first cohabitation are less likely to be in an interfaith union,
42 Interethnic and interfaith marriages in sub-Saharan Africa
which is consistent with the fact that they are less likely to belong to a traditional religion.
1.7.4 Testing Birth year v. Cohabitation year
The start of cohabitation may be a better measure of norms at the time that the union started:
it corresponds to what people may perceive, such as the fact that more (less) people are getting
married outside of their group. However, cohabitation year is less exogenous than birth year,
and age at marriage is higher for later-born cohorts, leading these cohorts to start cohabiting
at even later dates. In table 1.6, comparing columns (10) to (12), the results are robust to us-
ing cohabitation year instead of birth year on the sample of women who are still in their first
union. At the country-level, the results are robust to using cohabitation year instead of birth
year. Cohabitation year appears to indeed be endogenous to education and urban residence.
When using cohabitation year rather than birth year, there is a trend in all of the countries ex-
cept Burkina Faso and Niger, but after controlling for education and urban residence, this effect
remains significant in the countries where there was a trend using birth year and in Gabon. Af-
ter controlling for education and urban residence, the trends estimated on interfaith marriages
are different depending on whether birth year or cohabitation year is used only in Cote d’Ivoire
and in Senegal.
1.8 Concluding remarks
This paper documents patterns of interethnic and interfaith marriages in sub-Saharan African
countries. I use data from Demographic and Health Surveys that gather information on marital
history, education and geographic location to build a sample of women born between 1955 and
1989. I find that the share of interethnic marriages varies between countries but that such
unions are not uncommon: 20.4% of women are married to a man who is not from the same
ethnic group as them, contrasting with 9.7% of women who are married to someone who does
not share their faith, and 2.1% of women in Muslim-Christian marriages.
Studying marital outcomes of women born between 1955 and 1989, I find that interethnic mar-
riages became more common in half of the sample, and that their share remained constant in
the other half. This study concludes that higher educational achievements and widespread
urbanization contributed to that increase, but that these changes cannot explain all of the it,
suggesting that ethnic cleavages may lose salience in some parts of sub-Saharan Africa. Lin-
guistic distance for intermarried couples increased in some countries, suggesting that all eth-
nic boundaries were lessened. In other countries, linguistic distance decreased: boundaries
between groups that are close in terms of linguistic distance disappear and boundaries with
1.8 Concluding remarks 43
other groups may be reinforced. As interethnic marriage shares are far from rare and are even
increasing in half of the countries of the sample, using ethnicity as a proxy for a strong identifi-
cation with one single group may be misguided. In contrast, interfaith marriages are becoming
less common, a fact that can mostly be attributed to the conversion of former followers of tra-
ditional religions to Islam or Christianity, thus resulting in lower levels of religious diversity.
The share of Muslim-Christian marriages remains low and has not changed in most countries.
However, new religious groups, especially new religious branches within Christianity, are be-
ing separately counted in recent survey waves: religious boundaries within faith groups may
become more salient as a result of the expansion of Christianity and Islam.
Future research could aim at studying marriages that are intrafaith but interdenominational.
An additional strand of research could focus on better understanding the channels by which
education and urbanization impact the likelihood of marrying outside one’s own ethnic group:
Are cities different from rural areas because they are more heterogeneous in terms of ethnic-
ity? Or do people living in cities have more agency to choose a partner? Are educated women
more likely to marry outside of their ethnic group because they have more agency in choosing
a spouse or because they accessed more mixed markets by attending higher education institu-
tions? Related work could investigate whether marrying outside one’s own ethnic group is the
result of strategic behavior (Luke and Munshi, 2006). Are urban-dwellers turning away from
membership in their ethnic group to benefit from their membership in a religious group? Faith
groups or networks related to attending the same church or mosque could also be an opportu-
nity to access jobs and support. Deepening our understanding of whether marriage decisions
reinforce or change identity affiliations would bring important contributions to the political
economy literature on conflict, as well as to the literature on networks.
Note: This table lists all survey waves included in the main sample, and presents results useful to understand recodingchoices across waves.Sample: Women in union at the time of the survey.Ethnicity: Columns (1) and (7): Number of ethnic groups in the common classification (at least one married woman andone married man in each survey wave and each birth cohort). Columns (2) and (8): Number of ethnic groups with at leastone woman listed in the DHS classification. Columns (3) and (9): Share of women belonging to the “other (ethnicity)”group (includes foreigners) in the common classification.Religion: Columns (4) and (10): Number of religious groups with at least one woman listed in the DHS classification.Common classification is made up of three groups for all waves. except Senegal 2005, where Christian and Others arepulled together. Columns (5) and (11): Share of women belonging to the “other (faith)” group in the common classifica-tion.Remarriage: Columns (6) and (12): Share of remarried women.a The only Somali people in this wave are two men: this wave is not representative of the north-east of Kenya.b Religious affiliation not included in questionnaire. I use a different set of weights when not using this wave.
Appendix 45
A-1.2. Linguistic distance measures
Figure A-1.1: Ethnolinguistic tree for groups listed in DHS Benin
“Proto-Human”
Niger-Congo
Nilo-Saharan
Songhai
Southern
Dendi
Atlantic-Congo
Atlantic
Northern
Senegambian
Fulani-Wolof
Fula
West Central
Peulh
Volta-Congo
Benue-Congo
Defoid
Yoruboid
Edekiri
Yoruba
North
Central
Northern
Oti-Volta
Eastern
Betamaribe
Southern
Grusi
Eastern
Yoa & Lopka
Gur
Bariba
Bariba
Kwa
Left Bank
Gbe
Fon
Fon
Aja
Adja
The names in bold are the names of the ethnic groups listed in DHS Benin. The names in italic are the names oflinguistic groups the ethnic groups were matched to.
Gershman and Rivera (2018) describe the specificity of linguistic trees in the case of ethnolin-
guistic groups in sub-Saharan Africa. While languages have not always been associated with
ethnicity (Canut, 2002), there is currently a strong association between languages (or language
groups) and ethnic groups. First, I match each group listed the recoded DHS classification to
the corresponding linguistic group according to the information listed in the Ethnologue dic-
tionary (Simons and Fennig, 2017). Second, I compute the linguistic distance of each pair of
linguistic groups within a country.
I define linguistic distance between two groups as the mean number of nodes to their first
common subfamily. The number of nodes is computed from the linguistic group level. For
instance, the linguistic distance between the ethnic groups Adja and Fon, in Benin, is 1 (one
step needed to go to the last common linguistic subfamily: Aja-Gbe (1) and Fon-Gbe (1)). The
linguistic distance between Kamba and Kikuyu is 0 (they share a subfamily: Kikuyu-Kamba).
The linguistic distance between Kalenjin and Luo is 4.5 (the average of the distance Kalenjin-
Nilotic (3) and Luo-Nilotic (6)).
When computing the value of the observed linguistic distance, if one spouse does not belong to
any identified ethnic group, but her/his partner does, I define the linguistic distance between
them as the mean linguistic distance of intermarried couples within the country. When com-
46 Appendix
puting the value of the random linguistic distance, I define the linguistic distance of couples in
which only one spouse is “other (ethnicity)” as the median value of the defined distances at the
language pair level.
Figure A-1.2: Ethnolinguistic tree for groups listed in DHS Kenya.
“Proto-Human”
Afro-Asiatic
Cushitic
East
Somali
Somali
Nilo-Saharan
Eastern Sudanic
Nilotic
Western
Luo
Southern
Luo-Acholi
Luo
Luo
Southern
Kalenjin
Kalenjin
Niger-Congo
Atlantic-Congo
Volta-Congo
Benue-Congo
Bantoid
Southern
Narrow Bantu
Central
J
Masaba-Luyia
Luhya
G
Swahili
Mijikenda Swahili
E
Nyika
Taita & Taveta
Kuria
Kisii
Kikuyu-Kamba
Meru & EmbuKikuyuKamba
The names in bold are the names of the ethnic groups listed in DHS Kenya. The names in italic are the names oflinguistic groups the ethnic groups were matched to.
Appendix 47
Online Appendix
B-1.1. Supplementary Appendix on data
DHS data
The main sample is made up of 15 countries. Below are listed the criteria for inclusion in
the main sample, as well as more detailed methodological information on reweighting and
recoding the data. Table 7 in paper lists the data waves used in the main sample. Table B-1.1
lists the survey waves that are not included in the main sample, as well as the reason why they
were not included.
Criteria - Main sample
The criteria for inclusion are as follows: First, countries must have implemented at least two
survey waves that include ethnic information24. How ethnic classifications are chosen is not
mentioned in the DHS reports. Second, the ethnic classifications must be comparable across
waves. Third, ethnic groups must be ethnolinguistic groups that can be matched to linguis-
tic groups25 using Ethnologue (Simons and Fennig, 2017). Fourth, the surveys must include
women born between 1955 and 1989, in order to observe women from all of the countries for
each birth year within the study period.
24A question on religious identity is included in all of the surveys except Senegal 1992. I compute the specificweights for specifications run without Senegal 1992. Excluding this survey does not change the study period.
25For instance, DRC and Chad list groups that correspond to geographic areas (e.g. “cuvette central” and “uelelac albert” in DRC). These places are heterogenous in terms of ethnic groups, thus leading me to exclude Chad andDRC from the main sample.
48 Appendix
Table B-1.1: DHS – Countries and waves not included in the main sample
Country in main sample Survey wavea Reason why wave not in main sample
Cameroon 2011 (10) No common classificationCote d’Ivoire 1998b –Malawi 1992 Ethnicity not availableNiger 12012 Ethnicity not availableSenegal 1997c
Uganda 2000 2006 2011 (19 / 5d ) Ethnicity not available
Country in maps section 6.1 Survey wave and nb ethnic groups Reason why country not in main sample
Central African Republic 1994 (10) Only one waveChad 1996 (13) 2004 (13) 2015 (13) Restricted study periodf
Congo (Brazzaville) 2005 (85) 2011(12) No common classificationCongo Democratic Republic 2007 (10) 2013 (11) Restricted study periodEthiopia 2000 (42) 2004 (48) 2011 (47) 2016 (42) Common classification only 2011 and 2016
Restricted study periodLiberia 2007 2013 (18) Only one waveMozambique 1997 (7) 2003 2011 (21) No common classificationNamibia 2000 (10) 2006 2013 Only one waveNigeria 2003 2008 (11) 2013 (400) No common classificationSierra Leone 2008 (10) 2013 (12) Restricted study period
Country not included Survey wave Reason why not included
Angola 2015 Ethnicity not availableBurundi 2010 2016 Ethnicity not availableComoros 1996 2012 Ethnicity not availableEswatini 2006 Ethnicity not availableLesotho 2004 2009 2014 Ethnicity not availableMadagascar 1992 1997 2003 2008 Ethnicity not availableRwanda 1992 (3) 2000 2005 2010 2014 Ethnicity not availableSao Tome and Principe 2008 Ethnicity not availableSouth Africa 1998e Ethnicity not availableTanzania 1991 1996 1999 2004 2010 2015 Ethnicity not availableZimbabwe 1994 1999 2005 2010 2015 Ethnicity not availableListed countries and waves: The list includes only countries and waves for which information about couples was included. For each survey wave, if a questionabout ethnicity was included, I indicate the number of ethnic groups with at least one female member.Number of ethnic groups: Number of ethnic groups used when pooling survey waves for a country. The number of groups includes the group “others”.Survey waves in brackets are not included in the sample.
a When a survey took place during two calendar years, the year listed is the year when data collection started.b Cote d’Ivoire 1998: Men were not asked their ethnic identity.c Senegal 1997: No information on whether women have remarried or not.d Uganda 2011: Men’s ethnic identities are classified into 5 groups, while women’s are classified into 19 groups. This wave is not included in the sample in
order to avoid loosing too much information by recoding ethnic groups into 5 categories.e South Africa 1998: Race is included, ethnicity is not.f Restricted study period: Inclusion of this country would lead to studying only a restricted sample, as the overlap of birth cohorts in this country and in countries
in the main sample is too small.
Reweighting - Main sample
When reweighting each survey wave, I take into account several issues associated with weights.
First, the weights provided by DHS do not sum up to population size. Using World Bank pop-
ulation statistics, I make sure that the weights of each survey correspond to the population size.
Second, women aged 15-49 are surveyed in all of the households, but men are surveyed in a
fraction of the surveyed households. The lowest sampling rate of men is 25% (Malawi 2000),
and the highest is 100% (DHS Ghana 2014, DHS Zambia 2013/2014). I adjust the weights by
multiplying them by the inverse of the sampling rate of men. Third, the number of survey
Appendix 49
waves differs across countries. I correct for these differences.
Recoding ethnic and religious groups
The common ethnic classification includes only the groups that were listed in all surveys. In
a few cases, such as Cameroon, I do not use the survey wave whose classification differs too
much from other waves. When the number of groups does not vary much across waves, I re-
code the ethnic classifications under the assumptions that individuals have a preferred answer
to the question “what is your ethnic group?” and that this answer is not affected by changes
in the classification. There are two cases: groups that appear only in some waves, and groups
that are alternatively listed as several subgroups and as one group. If individuals give an an-
swer that is not in the list (e.g. Maasai), this answer is recoded in “other (ethnicity)”. I hence
assume that a Maasai individual would have been coded as belonging to the “other” group in
DHS surveys that do not list this group. Based on that assumption, I assign to all of the Maasai
individuals the identity “other (ethnicity)” as the common classification for Kenya does not
include the group Maasai. I assume that subgroups are recoded into the corresponding group
in the classification. For instance, early DHS in Ghana list ethnic groups as “1 Asante, 2 Ak-
wapim, 3 Fante, 4 other Akan 5. Ga/Adangbe (etc.)”, whereas later waves list only “1. Akan
2. Ga/Adangbe (etc)”. I recode all of the Akan answers into a single category. To alleviate
concerns about measurement error due to recoding of ethnic groups, I check that the share of
respondents listed in the group “other ethnic group” (detailed statistics in Table 7, Appendix A
of the paper) remains roughly constant across cohorts and survey waves.
The case of religious groups is more straightforward. I recode the religious groups into three
different groups: Muslims, Christians, and “other (faiths)”. “Others (faiths)” includes mem-
bers of traditional religions, agnostics/atheists, members of other religions listed as such, and
a handful of very small religious groups. The share of members of “other (faiths)” does not
remain constant over time, reflecting changes in religious composition of countries rather than
errors in categorization of groups, as identification of religious groups is easier than identifica-
tion of ethnic groups.
B-1.2. Descriptive statistics
Intermarriages: random, observed, and ratio of intermarriages.
Table B-1.2 and table B-1.3 show the results for random, observed and ratio at the country-
level. This corresponds to what is displayed in Figures 4 and 5 of the paper. Table B-1.4 shows
descriptive statistics on linguistic distances.
50 Appendix
Table B-1.2: Observed and random intermarriage shares
Interethnic marriages shares Interfaith marriages sharesSurvey year Country Observed Random Observed Observed Random Observed
Data: Survey wave (DHS) conducted the closest to 2005. Sample: Women currently in union.These shares correspond to what is plotted in the maps in figure 4 and figure 5 of the paper.
a Central African Republicb Democratic Republic of the Congo
Descriptive statistics on explanatory variables: primary, secondary, and urban residence
Appendix 51
Table B-1.3: Muslim/Christian marriages & religious structure
Intermarriage share Population shareInterfaith Christian/Muslim Other/marriage marriage Muslim Christian Traditional
Sample & data: Women currently in union, weighted DHS data at country level.Columns (1), (3), and (5): Percentage of women whose highest educational outcomeis primary school (column (1)), secondary school or higher (column (3)), and wholive in an urban area (column (5)).Columns (2), (4), and (6): OLS regressions run separately for the 15 countries of thesample. Standard errors are clustered at the DHS-cluster level. The dependentvariable is listed on the header of each part of the table: it is a dummy equals eitherto 0 or to 100. Results in columns (2), (4), and (6) can be interpreted as the change inpercentage points associated with being born a year later, once quadratic controlsfor age are introduced.Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
Appendix 53
B-1.3. Ethnic composition over time
Are changes on intermarriages rates due to changes in market composition? I show below how
random shares of intermarriages changed over cohorts.
Figure B-1.1 shows random shares of intermarriages for countries where the share of interethnic
marriages increased over time (results from specification 1.1). For these countries, random
shares remained stable over time: the level of ethnic diversity does not change, so it cannot
explain the increase in interethnic marriages shares.
Figure B-1.2 shows the random interethnic marriage shares for countries where the share of
interethnic marriages did not significantly increase over time. Fluctuations of these random
shares are due to changes in the share of “other (ethnicity)” over birth cohorts in Gabon and in
Burkina Faso.
The only country where the level of ethnic diversity may have decreased is Niger. The random
shares are lower for the later-born cohorts than for the earlier-born ones, due to the fact that
the share of Haussa women increased from 58 to 64% of the married population. This increase
is not due to changes in the population, but to the fact that Haussa girls married even younger
than other girls. Hence, this composition effect should be controlled for by age effects. In Niger,
time trends on interethnic marriages are negative but not significant.
54 Appendix
Figure B-1.1: Random interethnic marriage shares - Panel A
Observations 9170 3066 2274 9241 5603 10711 97111R-squared 0.045 0.037 0.039 0.038 0.029 0.099 0.272Share intermarriage 10.4 20.5 38.0 31.8 12.7 46.0 20.4Data: Pooled DHS for each country. Weighted data. The pooled sample regression includes country-fixed effects (hence no constant). Sample: Women in union at the time of the survey. Specification:OLS regression. Standard errors are clustered at the DHS-cluster level. Dependent variable is a variable that equals 0 is the union is intraethnic, 100 if the union is interethnic. The regression equationis the same as displayed in column (2) of table 3 of the paper.Panel A: Countries with a positive and significant trend on interethnic marriages, when only age is controlled for.Panel B: Countries for which the trend on interethnic marriages is insignificant when only age is controlled for.Results can be interpreted as changes in percentage points.
Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
Appendix 57
Table B-1.7: Women’s characteristics and interfaith marriage
Panels A1&B1 Benin Burkina Faso Gabon Ghana Kenya Togo Zambia Pooled sample
Data: Pooled DHS for each country. Weighted data. The pooled sample regression includes country-fixed effects (hence no constant). Sample: Women in union at the time of thesurvey. Specification: OLS regression. Standard errors are clustered at the DHS-cluster level. Dependent variable is a variable that equals 0 is the union is intrafaith, 100 if theunion is interfaith. I consider three religious groups (Christians, Muslims, Others), intermarriage happens between these three groups. The regression equation is the same asdisplayed in column (2) of table 5.Panel A1 & B1: Countries with a negative and significant trend on interfaith marriages, when only age is controlled for.Panel A2 & B2: Countries for which the trend on interfaith marriages is insignificant when only age is controlled for.Panel A3: Countries with a positive and significant trend on interfaith marriages, when only age is controlled for.Trend on “others”: Coefficient associated to birth year, from a regression where the dependent variable is an indicator variable equals to 100 it the respondent is neither Muslimnor ChristianResults can be interpreted as changes in percentage points.
Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
58 Appendix
Table B-1.8: Trend on interethnic marriage shares
(1) (2) (3) (4) (5) (6) (7)Dependent variable Interethnic marriage Mean N
Birth year coefficient Each cell: coefficient from a separate regressionPanel A: Increase in interethnic marriage sharesBenin 0.242*** 0.193*** 0.188** 0.253*** 0.176** 15.1 10977
ControlsAge & Age2 X X X X XEducation X XUrban X XRemarried X X
Data: Pooled DHS for each country. Weighted data. Sample: Women in union at the time of the survey. Specification: OLSregression run separately for the 15 countries of the sample. Standard errors are clustered at the DHS-cluster level. Dependentvariable is a variable that equals 0 is the union is intraethnic, 100 if the union is interethnic.Columns (1) to (5) report the coefficient associated to the birth year variable. Each cell corresponds to a separate regression.Column (6) reports the number of observations for each country. When comparing columns (1) and (5), we can see whetherthere is a trend (column (1)) on the share of interethnic marriage and whether there is a trend (column (5)) once we controlfor individual characteristics (education, urban residence, whether the woman is not in her first union) which are positivelycorrelated with the likelihood to be in an interethnic union. Results can be interpreted as changes in percentage points.Results in columns (1) to (5) can be interpreted as changes in percentage points.
Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
Appendix 59
Table B-1.9: Trend on interfaith marriage shares
(1) (2) (3) (4) (5) (6) (7)Dependent variable Interfaith marriage Mean N
Birth year Each cell: coefficient from a separate regressioncoefficientPanel A: Increase in interethnic marriage shares
Panel A1: Decrease in interfaith marriage sharesBenin -0.150** -0.140* -0.141* -0.118 -0.103 16.7 10977
ControlsAge & Age2 X X X X XEducation X XUrban X XRemarried X X
Data: Pooled DHS for each country. Weighted data. Sample: Women in union at the time of thesurvey. Specification: OLS regression run separately for the 15 countries of the sample. Standard errorsare clustered at the DHS-cluster level. Dependent variable is a variable that equals 0 is the union isintrafaith, 100 if the union is interfaith.Columns (1) to (5) report the coefficient associated to the birth year variable. Each cell correspondsto a separate regression. Column (6) reports the share of interfaith marriages. Column (7) reports thenumber of observations for each country. When comparing columns (1) and (5), we can see whetherthere is a trend (column (1)) on the share of interfaith marriage and whether there is a trend (column(5)) once we control for individual characteristics (education, urban residence, whether the woman isnot in her first union) which are positively correlated with the likelihood to be in an interfaith union.Results can be interpreted as changes in percentage points.Results in columns (1) to (5) can be interpreted as changes in percentage points.Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
60 Appendix
B-1.5. Additional robustness analyzes at country-level
I implement four robustness checks on my findings: I present here the results for each coun-
try. First, I relax the assumption that a marriage is intraethnic or intrafaith when both spouses
belong to the group “other”. Second, I test whether results are robust to alternative assump-
tions on remarried women’s first unions. Third, using only women in their first union, I test
whether “assimilation” and conversion take place over the length of a marriage. Fourth, using
only women in their first union, I compare time trends measured using birth year and using
marriage years. Table B-1.13 (ethnicity) and Table B-1.14 (religion) display results from the
main regression and from the regressions when varying the assumptions, as mentioned above.
Testing for heterogeneity in the “other” group
The group “other ethnicity/faith” is a group that is more heterogenous than other groups. In
the main specification, I assumed that when both spouses belonged to the group “other”, their
union was an in-group one. Assuming that these unions are in fact out-groups unions, more
unions are now counted as intermarriages.
In the case of interethnic marriages, the results change in a few countries (main results in
columns (1) and (2), results under this hypothesis in columns (7) and (8), Table B-1.13 (eth-
nicity) and Table B-1.14 (religion)). Among countries where interethnic marriages increased,
trends turn insignificant for Cote d’Ivoire and Mali. Among countries where they did not in-
crease using the main specification, trends turn significant for Burkina Faso and Gabon. In
Burkina Faso, Gabon and Mali, these changes are due to the fact that the share of “other-other”
unions has varied over time. The share of “others” is around 45% in Cote d’Ivoire, the highest
share among all countries in the sample, and only 16% of “others” are married outside of their
group: the “other” hypothesis shifts a large fraction of unions from intraethnic to interethnic.
Even under this hypothesis, there is no country for which interethnic marriage shares decrease.
Results on interfaith unions do not change for countries where the share of such unions de-
creased. Trends on interfaith unions turn negative in Cote d’Ivoire, Malawi and Mali. This
finding is consistent with the fact that the share of members of traditional religions decreased
in these countries, so counting unions between members of this group as intrafaith or as inter-
faith does not affect the trend. By contrast to these results, Niger saw an increase in interfaith
marriages. This is due to the fact that the share of “other” increased in the youngest Nigerien
cohorts.
Appendix 61
Testing the remarriage story
First, I bound my estimates by making assumptions on first unions of remarried women. Ta-
ble B-1.13 (ethnicity) and Table B-1.14 (religion) (columns (1) to (6)) show the results at the
country-level. The “lower bound” (on the birth year coefficient) hypothesis assigns an intereth-
nic union to all the women who have remarried (translating into an higher share of interethnic
marriages. The “higher bound” (on the birth year coefficient) hypothesis assigns an intraeth-
nic union to all the women who have remarried (translating into an lower share of interethnic
marriages on average. For Panel A, the sign of the bounds conflict in Benin, Senegal and Togo.
Results for other countries and for interfaith marriages are robust to these changes. The fact
that, under the lower bound hypothesis, results change in the same direction for both intereth-
nic and interfaith marriages (i.e. trends turn negative, but never positive) indicates that the ef-
fect captured is mostly that remarried women are more likely to be older women, and women
belonging to earlier-born cohorts, and that I assign to these earlier-born cohorts high shares of
intermarriages, which are even higher than what is observed in later-born cohorts.
Second, I test whether trends I observed come from remarried women or from women in their
first union (Table B-1.13 (ethnicity) and Table B-1.14 (religion), results on columns (9) to (13)).
Look at interethnic marriages, there is no trend in both sub-samples in all countries of Panel
B. Any trend found on the whole sample in found among women in their first unions, and
there are positive trends for remarried women in Benin, Senegal and Uganda (coefficients are
positive and high in all countries but Cote d’Ivoire). In the case of interfaith marriages, I find
negative trends for both sub-samples in all countries where interfaith shares decreases, expect
in Benin and Togo, where trends turn insignificant. Point estimates are high for the remarried
sample, indicating it is likely an issue of power. Coefficient turns negative for Malawi, where
remarried women and women in their first unions do not experience the same trends. In Niger
the coefficient is positive for remarried women.
Testing the assimilation/conversion story
Older women have spent more time in union than younger women: as spouses spend longer
in union, their ethnic or religious identity may change26. Exploiting the fact that I have several
survey waves for each country, I can study whether women who married for the first time
the same year and were born the same year are more (less) likely to report having the same
ethnic (religious) group as their husband when the length of union increases. However, the
26Conversation or “assimilation” may take place before cohabitation or marriage, but I cannot estimates thoseusing DHS.
62 Appendix
Table B-1.10: Ethnic identification and time in union
Panel A Benin Cote d’Ivoire Ghana Guinea Kenya Mali Senegal Togo Uganda
Observations 8040 2452 1553 7196 4305 8743 80810R-squared 0.005 0.011 0.018 0.003 0.003 0.015 0.244Share intermarriage 10.3 19.5 34.3 31.1 11.2 45.4 19.4Data: Pooled DHS for each country. Weighted data. The pooled sample regression includes country-fixed effects (hence no constant). Sample: Women still in their first union at the time of the survey.Specification: OLS regression. Standard errors are clustered at the DHS-cluster level. Dependent variable is a variable that equals 0 is the union is intraethnic, 100 if the union is interethnic.Under the assumption that the occurrence of divorce and of widowhood are not correlated with a woman’s ethnicity, marital status and her husband’s ethnicity, then the variable “number of years sincecohabitation” would indicate whether women are more likely to declare they belong to the same ethnic group as their husband (thus “assimilating” into his ethnic group as the length of the union increases).The results are more suggestive of a “selection” into divorce/widowhood story than of an “assimilation” story, as the length of the union is not significant in most countries, and as the signs are conflictingwhen this coefficient is significant.
Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
identification ultimately rests on differences across survey waves, so this will also capture any
effect linked to survey wave. Results using this specification should be compared with results
on women still in their first union (Table B-1.13 (ethnicity) and Table B-1.14 (religion) (columns
(9) to (11)).
Table B-1.10 shows the test of the assimilation story for interethnic marriages. Women who
were older when they started cohabiting are more likely to be in an interethnic union, which
is consistent with the fact that these women are more educated and more likely to live in an
urban area than their counterparts, and that these characteristics are positively correlated to the
likelihood to be in an interethnic union. The coefficient of the number of years till cohabitation
is not significant in the pooled sample, which hides discrepancies across countries. The length
of union is only positively correlated to the likely to be in an interethnic union in Guinea, and
insignificant in other countries where the share of interethnic unions has increased. The posi-
tive coefficient for Guinean women is indicative of selective divorce: women who were in an
intra-ethnic union are more likely to divorce than their counterparts, maybe because they were
less likely to have chosen their first husband than women who married outside of their ethnic
group. This story is consistent with the fact that point estimates of birth year is high for the sub-
sample of remarried women, even if insignificant. In countries where interethnic marriages did
not become more frequent, the length of union is always negative, and is significant in Ghana,
Appendix 63
Niger, and Zambia. This indicates that there might be selective divorces or assimilation in these
countries, which might be the reason why we do not observe a trend on interethnic marriage
shares.
Table B-1.11: Religious identification and time in union
Panels A1&B1 Benin Burkina Faso Gabon Ghana Kenya Togo Zambia Pooled sample
Number of years since cohabitation -0.194** -0.200** -0.727*** -1.522*** -0.258*** -0.191 -0.101* -0.221***(0.0877) (0.0993) (0.271) (0.160) (0.0614) (0.142) (0.0571) (0.0329)
Age at cohabitation -0.298** 0.0835 -0.886*** -2.096*** -0.504*** -0.703*** -0.264*** -0.258***(0.124) (0.200) (0.303) (0.182) (0.0907) (0.223) (0.0979) (0.0544)
Number of years since cohabitation -0.209 -0.0509 -0.0751 -0.0600 0.0648 -0.0682 -0.134 0.333(0.184) (0.112) (0.0765) (0.0795) (0.0597) (0.0636) (0.0934) (0.254)
Age at cohabitation 0.479 0.0726 -0.146 -0.105 0.134 0.195 -0.0630 0.791***(0.321) (0.141) (0.160) (0.117) (0.0992) (0.120) (0.153) (0.297)
Observations 2262 3977 7196 7549 4305 6687 1915 2452R-squared 0.007 0.001 0.002 0.001 0.002 0.008 0.005 0.004Share intermarriage 17.8 4.6 6.9 6.0 1.7 1.9 4.5 11.1Data: Pooled DHS for each country. Weighted data. The pooled sample regression includes country-fixed effects (hence no constant). Sample: Women still in their first union at the time of thesurvey. Specification: OLS regression. Standard errors are clustered at the DHS-cluster level. Dependent variable is a variable that equals 0 is the union is intrafaith, 100 if the union is interfaith. Iconsider three religious groups (Christians, Muslims, Others), intermarriage happens between these three groups.Under the assumption that the occurrence of divorce and of widowhood are not correlated with a woman’s religious affiliation, marital status and her husband’s religious affiliation, then thevariable “number of years since cohabitation” would indicate whether women are more likely to declare they belong to the same religious group as their husband (thus “assimilating” into hisreligious group as the length of the union increases).
Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
Table B-1.11 shows the test of the conversion story for interfaith marriages. Patterns differ
across panels. When looking at countries where the share of interfaith marriages has not
changed (lower panel - A2 & B2), length of union and age at cohabitation are not signifi-
cant. When looking at Cameroon, the only country where interfaith marriages have become
more frequent, results are similar to what is seen when studying interethnic marriages: older
women at the time of their first cohabitation are more likely to be in an interfaith union. Look-
ing at countries where the share of interfaith unions decreased over time (upper panel - A1 &
B1), I find that older women at the time of their first cohabitation are less likely to be in an
interfaith union, which is consistent with the fact that they are less likely to belong to a tradi-
tional religion, in all countries but Burkina Faso. The length of union is negatively correlated
to the likelihood to be in an interfaith union in all countries of this panel, which is consistent
with either conversion, or with selective divorces. It seems like that followers of traditional
religions convert during their marriage: given intense proselytizing of other faiths, conversa-
tions pattern are more likely to go in this direction rather than Muslim or Christian individual
64 Appendix
converting to the faith of their spouse.
Testing Birth year v. Year of first cohabitation
The results from Table B-1.12 are commented in the paper.
Table B-1.12: Trend - Year of marriage
(1) (2) (3) (4)Dependent variable Interethnic marriage Mean N Dependent variable Interfaith marriage Mean N
Marriage year coefficient Each cell: coefficient from a separate regressionPanel B: No change in interethnic marriage shares (with birth year) Panels A2&B2: No change in interfaith marriage shares (with birth year)
Data: Pooled DHS for each country. Weighted data. Sample: Women in their first union at the time of the survey. Specification: OLS regression run separately for the 15 countries of the sample.Standard errors are clustered at the DHS-cluster level. Dependent variable is a variable that equals 0 is the union is intraethnic, 100 if the union is interethnic.Columns (1) and (2) report the coefficient associated to the year of marriage variable. Each cell corresponds to a separate regression. Column (3) reports the mean number of interethnic (interfaith)marriages in the regression sample. Column (4) reports the number of observations for each country.Results in columns (1) and (2) can be interpreted as changes in percentage points.Results are displayed only for countries in which interethnic (interfaith) marriages shares did not change using year of birth to measure time trends.
Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
Regression results - Dependent variable: Interethnic marriage - Each cell: birth year coefficient from a separate regression Share of interethnic marriages
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20)Sample All married women First union Remar. All married women First Remarried
Lower Upper Others Union NAssumptions Main Main Lower bound Upper bound Bound Other Main Main Main Main Main Main bound bound bound All
Age & Age2 X X X X X X X X X X X XEducation & Urban X X X X X X XRemarried XLength of union & Age at cohabitation X
Data: Pooled DHS for each country. Weighted data. Specification: OLS regression run separately for the 15 countries of the sample. Standard errors are clustered at the DHS-cluster level. Dependent variable is a variable that equals 0 is the union is intraethnic, 100if the union is interethnic. Definition of the dependent variable varies in specifications (1) to (8). Results are estimated under the main specification, but on two sub-samples in columns (9) to (13). Columns (14) to (19) show the observed share of intermarriagesfor the different specifications and sub-samples. Column (20) displays the number of observations.Columns (1) to (13) report the coefficient associated to the birth year variable. Each cell corresponds to a separate regression.(1), (2), (14) : Main specification, All women: Dependent variable: Interethnic marriages as observed in the data.(3), (4), (15) : Lower bound, All women: Dependent variable: Interethnic marriages, with all women who remarried counted as being in an interethnic union.(5), (6), (16) : Higher bound, All women: Dependent variable: Interethnic marriages, with all women who remarried counted as being in an intraethnic union.(7), (8), (17) : Higher bound, All women: Dependent variable: Interethnic marriages, with “other”-“other” unions counted as interethnic ones.(9), (10), (11), (18) : Main specification, First unions: Only women in their first union.(12), (13), (19) : Main specification, Remarried: Only women who have remarried.Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
Regression results - Dependent variable: interfaith marriage - Each cell: birth year coefficient from a separate regression Share of interfaith marriages
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20)Sample All married women First union Remar. All married women First Remarried
Lower Upper Others Union NAssumptions Main Main Lower bound Upper bound Bound Other Main Main Main Main Main Main bound bound bound All
ControlsAge & Age2 X X X X X X X X X X X X XEducation X X X X X X X XUrban X X X X X X X X
Data: Pooled DHS for each country. Weighted data. Specification: OLS regression run separately for the 15 countries of the sample. Standard errors are clustered at the DHS-cluster level. Dependent variable is a variable that equals 0 is the union is intrafaith, 100 if theunion is interfaith. Definition of the dependent variable varies in specifications (1) to (8). Results are estimated under the main specification, but on two sub-samples in columns (9) to (13). Columns (14) to (19) show the observed share of intermarriages for the differentspecifications and sub-samples. Column (20) displays the number of observations.Columns (1) to (13) report the coefficient associated to the birth year variable. Each cell corresponds to a separate regression.(1), (2), (14) : Main specification, All women: Dependent variable: Interfaith marriages as observed in the data.(3), (4), (15) : Lower bound, All women: Dependent variable: Interfaith marriages, with all women who remarried counted as being in an interfaith union.(5), (6), (16) : Higher bound, All women: Dependent variable: Interfaith marriages, with all women who remarried counted as being in an intrafaith union.(7), (8), (17) : Higher bound, All women: Dependent variable: Interfaith marriages, with “other”-“other” unions counted as interfaith ones.(9), (10), (11), (18) : Main specification, First unions: Only women in their first union.(12), (13), (19) : Main specification, Remarried: Only women who have remarried.Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
CHAPTER 2
PARENTAL DIVORCE AND CHILDREN’S EDUCATIONAL OUTCOMES IN
SENEGAL
Joint with Rozenn Hotte1
Abstract This paper provides new evidence on the consequences of parental divorce for chil-dren in Africa. Using survey data that collected the detailed life histories of Senegalese womenand their children, we investigate how children’s educational outcomes are affected by theirparents’ divorce. We use a sibling fixed-effects strategy that allows us to control for all thefactors that are common to all children in a family, such as parental preferences regarding edu-cation or the level of education of the parents, alleviating concerns of omitted variable bias. Wecompare children who were old enough to have been enrolled in primary school at the time ofthe divorce to their younger siblings, for whom enrollment decisions had not yet been madeat the time of the divorce. We find that younger siblings were more likely than their older sib-lings to have attended primary school. This higher level of investment does not persist in thelong run: there are no differences between siblings when considering primary school comple-tion. We find that custody and fostering decisions do not seem to mediate the positive effectson school attendance. Our findings are consistent with either an improvement of the financialsituation (due to remarriage) or an increase in the decision-making power of mothers after thedivorce
1This chapter was published in World Development (Crespin-Boucaud and Hotte, 2021). We thank Denis Cogneauand Sylvie Lambert for their advice as well as detailed comments that greatly improved the quality of this paper. Weare grateful to Sarah Deschenes, Oliver vanden Eynde, Michael Grimm, Helene Le Forner, Alexis Le Nestour, KarineMarazyan, Martin Ravallion, Paola Villar, Dominique van de Walle, and two anonymous referees for insightfuldiscussions, suggestions, and feedback on this paper. We also thank participants of the PSI-PSE and CFDS seminarsat the Paris School of Economics (PSE), of the internal seminar of the Department of Economics of the University ofSussex, of the CSAE Conference in Oxford, of the Nordic Conference on Development Economics in Copenhagen,of the Journees Augustin Cournot in Strasbourg, of the LAGV Conference in Aix-en-Provence and of the DIALConference in Paris for their helpful comments.
68 Parental divorce and children’s educational outcomes in Senegal
2.1 Introduction
Changes in family structure due to parental death or divorce are likely to affect children. How-
ever, while the consequences of orphanhood have been studied, little is known about the con-
sequences of divorce for children in sub-Saharan Africa, even though approximately 25% of
first unions end in divorce (Clark and Brauner-Otto, 2015). A divorce is likely to imply eco-
nomic losses, changes in caregivers, and psychological distress (Amato, 2000). All children
whose parents divorce may face these consequences, but in countries where formal safety nets
are scarce and poverty levels are high, such as those in sub-Saharan Africa, the consequences of
a divorce might be more severe than in developed countries and may potentially affect health
and access to education.
This paper addresses whether divorce affects the (basic) educational decisions parents make
for their children. We use data collected in Senegal to provide new insights into the conse-
quences of divorce in sub-Saharan Africa. Divorces in Senegal are neither rare nor common:
approximately 20% of first unions in Senegal end in divorce, close to the average divorce rate
in sub-Saharan Africa (Clark and Brauner-Otto, 2015). Moreover, couples from better-off back-
grounds are more likely to divorce than couples from poorer backgrounds (Lambert et al., 2019,
using the same dataset as we do), thus raising the question of whether children’s educational
outcomes suffer when their (educated) parents divorce.
We study whether age at divorce is correlated with primary schooling decisions. A simple
comparison of children according to the divorce status of their parents would not provide a
satisfactory answer to this question. Many parental and family characteristics are likely to si-
multaneously influence the probability that parents divorce and the schooling of their children,
leading to omitted variable bias. To avoid this issue, we use a sibling fixed-effects strategy that
allows us to control for all the factors that are common to all children in a family, such as
parental preferences regarding education or the level of education of the parents (Bjorklund
and Sundstrom, 2006; Le Forner, 2020). This strategy relies only on (within-family) differences
in age at the time of the divorce: we compare children for whom schooling decisions had not
yet been made by the divorce date to their older siblings for whom schooling decisions had
likely been made by the divorce date.
We study two related primary schooling outcomes. The first outcome is whether a child has
ever attended school (primary school attendance). This outcome captures the first investment
in (formal) education that a child can receive.
2.1 Introduction 69
The second outcome is whether the child has completed primary school (5th or 6th grade).
This outcome captures higher levels of investment in schooling as well as retention in the edu-
cational system. This paper focuses on primary school enrollment and does not discuss higher
levels of schooling due to sample size limitations.
The survey includes neither information on children’s academic performance (such as test
scores) nor the quality of the education to which they have access. These indicators would
have provided a more complete picture of the impact of divorce on education (Glewwe and
Kremer, 2006; Jones et al., 2014; Van der Gaag and Adams, 2010). While we do not know how
well children perform in school, it seems likely that children who attend school do not perform
worse than children who have never been enrolled in primary school: we can thus interpret an
increase in primary school as a likely increase in human capital.
We use the 2011 wave of the survey Pauvrete et Structure Familiale (PSF, De Vreyer et al. (2008)).
This survey combines two elements, the combination of which is rarely found in household
surveys, that are key to implementing the sibling fixed effects identification strategy. The PSF
survey collected detailed information on marital histories, including the year when and the
reason why—divorce or death—each marital union ended. It also collected information on
children younger than 25 born to all members of the surveyed households, thus including
children who do not live in the household. As custody of children is an outcome of divorce, in-
formation on children who are not household members ensures that the sample is not selected
on the results of divorce.
Overall, our findings suggest that divorce does not negatively affect the educational outcomes
of children who were young when their parents divorced. Children who were 5 or younger
when their parents divorced were more likely to have attended primary school than both other
children and their older siblings. We show that this difference is not caused by negative shocks
resulting both in divorce and in older children not attending primary school. The positive
effect we observe for younger children is instead explained by the fact that divorced parents
seem to be able to compensate their younger children after a divorce. This positive effect on
school attendance is robust to varying the cutoff for the age at divorce. However, this finding
must be nuanced: children who were 5 or younger by the divorce date were not more likely
to have completed primary school than their siblings who were of primary school age at the
divorce date or than their siblings who were supposed to have completed primary school by
the divorce date. Finally, children who were between 6 and 9 years old when their parents
divorced were as likely to have completed primary school as their older siblings. This finding
70 Parental divorce and children’s educational outcomes in Senegal
suggests that divorce might not have negative consequences for children when considering
primary-level educational outcomes.
Custody and fostering decisions do not seem to drive the positive effect on primary school
attendance. Part of the effect observed seems to be driven by the children of women who
remarry, which is consistent with the fact that remarriage often allows women to improve their
financial situation. Another potential channel is that some women might see their decision-
making power increase after they divorce: if a woman’s preference for education is higher than
her ex-husband’s, she might invest more in her children by sending them to school after the
divorce. Nevertheless, children who were young when their parents divorced were not more
likely to complete primary school than their older siblings, which indicates that this initial
investment is not sustained long enough to allow children to complete their primary education.
This paper makes two contributions to the literature. First, we provide new evidence that
divorce does not necessarily negatively affect children’s basic educational outcomes using a
sibling fixed-effects methodology that allows us to control for selection into divorce.2 While
not finding a negative impact of divorce on schooling outcomes may be surprising, it is in
line with recent evidence on the health outcomes of children whose parents divorce. Smith-
Greenaway (2020) concludes that in sub-Saharan Africa, following divorce, children’s health
benefits from their biological parents’ education to the same degree as children with married
parents, highlighting that selection into divorce and remarriage might be what drives the dif-
ferences previously observed between children.3
Second, this paper expands the literature on the link between divorce and children’s education
in developing countries by providing results in the Senegalese context that challenge previous
findings. Gnoumou Thiombiano et al. (2013) find that in Burkina Faso, children of divorced
parents are less likely to attend school than their counterparts but do not discuss selection into
divorce. Chae (2016) uses specifications with child fixed effects and finds that parental divorce
in rural Malawi is associated with lower grade attainment and a lower likelihood of children
attending school at the time of the survey. Our findings are at odds with both of these papers,
which is not surprising given that the selection into divorce is different in these three countries.
While divorce rates are approximately 50% in rural Malawi and very low in Burkina Faso, the
2When considering the broader category of paternal absence, the consensus seems to be that the death of thefather has either negative or no consequences for his children, depending on the outcome considered (Beegle et al.,2006; van de Walle, 2013), and that the death of the mother has negative consequences for her children (Beegle et al.,2010; Case and Ardington, 2006). A divorce does not necessarily imply paternal absence, and the impacts of bothshocks are likely to differ due to differences in the type of shock as well as differences in the characteristics of thefamilies affected by these shocks.
3Clark and Hamplova (2013) and Gnoumou Thiombiano et al. (2013) conclude that children of divorced mothershave worse health outcomes than children whose parents are still married.
2.2 Data: Pauvrete et structure familiale 71
Senegalese divorce rates are close to 20%. Relatedly, divorced mothers in Senegal are more ed-
ucated than their nondivorced counterparts, which is not the case for divorced mothers in rural
Malawi or in Burkina Faso. In the Senegalese case, children of divorced parents do not seem
to have worse schooling outcomes due to their parents’ divorce. This finding emphasizes the
need to study the consequences of divorce in several societies since context influences selection
into divorce and the potential consequences for children.
The rest of the paper is organized as follows. We introduce the dataset and survey used in our
analysis in section 2.2. Section 2.3 introduces elements of the context surrounding divorce and
education in Senegal. Section 2.4 details the identification strategy. Section 2.5 presents the
results and robustness checks. Section 2.6 discusses the channels that may mediate the impact
of parental divorce on children’s primary school enrollment. Section 2.7 concludes.
2.2 Data: Pauvrete et structure familiale
We use the second wave of the survey Enquete Pauvrete et Structure Familiale4 (PSF) that was
conducted in Senegal in 2011.5 The survey is described in detail in De Vreyer et al. (2008).
The PSF database has two specificities that allow us to identify women who divorced and chil-
dren born to parents who later divorced. First, the PSF records detailed information on marital
histories. Respondents were asked how many times they had experienced marital dissolu-
tion and, if relevant, what the reason for the most recent marital dissolution was (separation
or death of their husband). Respondents also provided the dates when their current and last
unions began and ended. In the case of a divorce or of a separation, the date likely reflects the
time when the separation became effective rather than the end of the eventual legal process.
Second, the PSF includes information on the children of each household member, regardless of
whether the children are themselves household members. Each individual in the household is
asked to indicate which children living in the household are her own and to list her children
living elsewhere, provided that they are younger than 25 years old. Thus, there is no selection
of children based on whom they live with, ensuring that our results are not biased by decisions
regarding custody and place of residence of the children after a divorce.
4Momar Sylla and Matar Gueye of the Agence Nationale de la Statistique et de la Demographie of Sene-gal (ANSD), Philippe De Vreyer (University of Paris-Dauphine and IRD-DIAL), Sylvie Lambert (Paris School ofEconomics-INRA) and Abla Safir (World Bank) designed the survey. The data collection was conducted by theANSD.
5The first PSF wave covered a representative sample of the Senegalese population, and the second wave includedrespondents of the first wave and the household members living with them. The number of respondents was almosttwo times higher in the second wave (28 000 individuals versus 14 450 individuals) than in the first wave. Thesample of interest is not large enough in the first PSF wave; hence, we decided to use the second wave.
72 Parental divorce and children’s educational outcomes in Senegal
We focus on divorced mothers and their children and do not study fathers who have divorced.
There are two reasons for this choice. First, we cannot identify the children affected by a di-
vorce through their father if he was polygamous at the time of the divorce, as the data contain
information on the date of the divorce, but not on the rank of the woman whom the man di-
vorced (for individuals already surveyed in 2006).Second, we need information on children’s
age at the divorce date, and women are more likely to accurately report the age of their children
than men, who, on average, have more children than women due to high polygamy rates.6 The
data we use are partly retrospective and, in some cases, concern children who are not house-
hold members. However, misreporting should not be a major issue, as it is likely that mothers
remember birth, marriage, and divorce years and know whether their children went to school,
even if they are not living in the same household as their children.
There are two main limitations to using the PSF. First, households in which divorced women
live at the time of the survey are not the households in which these women and their children
lived before the divorce.7 Information on the previous household is more limited than infor-
mation on the current household. The data allow us to know if the child was living in the
surveyed household or if she had already left it, but we do not have retrospective information
on the exact place of residence at the time when the child was in school. Second, information
collected on children who are not members of the surveyed household is limited. Two types
of education-related questions are asked about these children: whether the child is currently
attending school and what her highest level of education is. Information on the age at which a
child started school was collected only for children living in the surveyed households.
2.3 Background: Divorce and education in Senegal
2.3.1 Insights on divorces in Senegal
Two different worlds: Legal and customary divorces
We draw from a report by Lagoutte et al. (2014) that provides detailed information on and
analyses of marital dissolution practices in Senegal.
Since 1973, according to the Code Senegalais de la Famille, legal marital dissolution (divorce)
must be pronounced, even if the marriage was not legally registered. However, qualitative643% of married men older than 45 years old have more than one wife. Polygamous men who live with all their
wives have on average 7.5 children.7We can retrieve information on the household that existed before the divorce only if the divorce took place
between the two waves. There were only 65 women who divorced between the two waves. Among these 65women, only 43 had at least one child with their husband, and among them, only 24 had a child older than 6 in2011.
2.3.1 Insights on divorces in Senegal 73
work conducted by one of the authors suggests that most women do not formally divorce but
instead receive a customary divorce. Under customary law, a wife may ask her husband for a
divorce, but he needs to agree to the divorce for it to be effective. Conversely, a husband can
divorce his wife even if she does not agree to the divorce.8 Under the 1973 Senegalese family
law, which is meant to protect women and ensure gender equality, both parties can file for
divorce. However, women are much more likely to file for divorce than men, as women make
approximately 75% of the claims in court. This discrepancy may be because in a polygynous
society, a man can marry another woman rather than seek a divorce in the case of an unhappy
marriage. Most of the divorce rulings are granted on the grounds of incompatibilite d’humeur
(a rather vague term; literally, “mood incompatibilities”) or of “defaut d’entretien par le mari”
(the husband failed to support his wife economically).9 The existence of these two grounds for
divorce is anterior to the 1973 Senegalese family law: both are included in Islamic law and in
customary law.
If children were born to a couple who divorces, then who is to have custody of the children
needs to be decided. There is uncertainty over children’s residence: Lambert et al. (2019) stress
that women often declare in interviews that they worry about their children being taken away
from them should they separate from their husbands. In the case of a formal divorce, the judge
decides on the children’s residence and declares one parent to be the main caregiver. Mothers
are usually granted custody of their daughters and of their young sons, and the judge can order
the father to pay child support. In the case of a customary divorce, fathers can claim custody of
their children as soon as they are no longer being nursed. If the children stay with their mother,
their father might contribute to their living expenses if he is able and willing to do so. Below,
we describe the patterns we observe in the data regarding the families that are affected by a
divorce.
Characteristics of divorced mothers and their children
We contrast the characteristics of divorced mothers with the characteristics of mothers who
have never divorced (Panel A, Table 2.1).10 Among mothers of children younger than 25 years
8There are reports that repudiation—a unilateral divorce right granted only to the husband—while outlawed, isstill practiced in Senegal. However, no woman ever mentioned having been repudiated during qualitative inter-views, which might be due to social stigma associated with the practice.
9Lagoutte et al. (2014) reports that few divorces are jointly filed. The two grounds for divorce most commonlycited are also likely to hide other reasons for divorce, such as infidelity. Additionally, alimony can be provided onlyif the husband filed for divorce under the motive that he does not get along with his wife or in the case of a divorcedue to a serious illness.
10Throughout the paper “divorced mothers” refers to divorced mothers who had, at the time of the survey, atleast one child younger than 25 who was born from the union that ended in a divorce. Similarly, “mothers” refersto women who had at least one child younger than 25 years old on the survey date. It is important to note that 20%of women who divorce are childless: those women are not included in our analysis.
74 Parental divorce and children’s educational outcomes in Senegal
old, 11% have ever divorced.
Divorced mothers, on average, seem to be better off than their counterparts: they are more
educated, more likely to live in an urban area and more likely to come from a rather well-off
background, as indicated by the fact that the fathers of divorced mothers are more likely to be
self-employed or state-employed. This difference is not surprising: women who divorce need
to access resources to compensate them for the (potential) loss of resources associated with the
divorce, especially if their ex-husbands provided for them during their marriage. Educated
women are likely to have better outside options than uneducated women: they have access to
more valuable jobs and to better matches in the (re)marriage market.11 Their family network
may be able to provide more financial assistance. Additionally, the profession of the father
is a proxy for social class and may capture how empowered women are. Divorced women
are therefore positively selected: this may mean that either most women in our sample have
chosen to divorce their ex-husbands or that men do not abandon women who are vulnerable.
When considering the type of marriage, mothers who are the first wife in a polygamous union
are less likely to divorce than other mothers. Women of higher rank in a polygamous union are
more likely to divorce, which could be due to difficulties cohabiting. 12
When including all the variables defined before the divorce in a linear probability model (LPM)
(column 4), the findings are similar: higher levels of education as well as some categories re-
lated to fathers’ occupations are associated with a higher likelihood of divorce. Additionally,
the coefficients associated with two ethnic groups (Wolof and Poular) are also significant: social
norms and practices related to divorce may vary across groups. 13
This positive selection into divorce is also seen in the characteristics of the children (Panel B,
Table 2.1). Children whose parents divorced were 7 percentage points more likely to have
attended primary school than children whose parents did not divorce, 5 percentage points
more likely to have completed primary school by the expected age, and 6 percentage points
more likely to have attended secondary school. This difference disappears when we control for
the education of the mother and is consistent with the vast literature linking parental education
to investments in children’s human capital (on Senegal, Dumas and Lambert (2011)).
Children whose parents have divorced have, on average, a lower birth order than children
1176% of the sample of divorced mothers had already worked before the divorce (and this proportion does notvary according to educational status).
12These results should be interpreted with caution, as they could be affected by measurement issues. For instance,if a woman divorced her husband because he was about to take a second wife, she could declare her previousmarriage to have been either a polygamous union (with her being the first wife) or a monogamous union.
13These differences in the probability of divorce do not affect our identification strategy, since we compare chil-dren who have the same mother.
2.3.1 Insights on divorces in Senegal 75
Figure 2.1: Age of mothers and of their children at the time of the divorce
(a) Age of mothers at the time of the divorce.0
24
68
10Pe
rcen
t
10 20 30 40 50 60Age at divorce
(b) Age of children at the time of the divorce.
05
1015
Perc
ent
0 5 10 15 20 25Age at divorce (children)
whose parents have not divorced. This difference is explained by the fact that there are fewer
children born to parents who divorce than to parents who do not; thus, there are few children
with a high birth order among children of divorced parents.
Characteristics at the time of divorce
Mothers were on average 29 years old when the divorce took place and almost all mothers
who divorced did so when they were between 20 and 35 years old (Figure 2.1a).14 The average
length of the marriage before divorce was 7 years, and the median was 5 years. Divorced
mothers had on average 1.77 children from their last union: 50% had only one child and 30%
had two children. Mothers who had not divorced had on average 2.96 children born to their last
union. This difference is explained by the fact that few women divorce after long marriages,
resulting in a lower number of children being born into the union. Relatedly, women who
divorce do so, on average, shortly after the birth of a child: 73% of divorces occur when the
youngest child is a toddler. Hence, children whose parents divorce are rather young: children
are on average 6 years old at the time of the divorce and few children are older than 10 when
their parents divorce (Figure 2.1b).
Characteristics of households after a divorce
Situation of divorced mothers A divorce almost always results in changes in the household
composition. The two most common types of living arrangements for a woman after her di-
vorce are to remarry or to move back in with her parents. A total of 41% of ever-divorced
women remarried at the time of the survey, and 42% lived with at least one of their parents.
Few divorced women live on their own due to financial constraints as well as social norms that
14The irregular shape of the histograms is likely due to age heaping. However, there is no heaping on divorcedates with respect to the survey date.
76 Parental divorce and children’s educational outcomes in Senegal
Figure 2.2: With whom do children usually live?
(a) Children whose parents have not divorced.
0
20
40
60
80
100
0-5 years 5-10 years 10-15 years 15-25 yearsby age on the survey date
Lives with other people Lives with motherLives with father Lives with both parents
(b) Children whose parents have divorced.
0
20
40
60
80
100
0-5 years 5-10 years 10-15 years 15-25 yearsby age on the survey date
Lives with other people Lives with motherLives with father
prescribe that women of child-bearing age must be married (Lambert et al., 2019). Divorced
women may also choose to move back into their parents’ house after a divorce to obtain finan-
cial and emotional support as well as help with the children.
At the time of the survey, divorced mothers were part of households that were on average
wealthier in terms of per capita household consumption levels than women who were not di-
vorced (Panel A, Table 2.1). This finding is consistent with the fact that the selection into divorce
is positive. It also means that the potential negative impact of divorce on financial resources
does not lead to a reversal in women’s relative situations in terms of financial resources.
Custody: With whom do children of divorced parents live? A total of 64% of the children
of divorced parents live with their mother. Whatever their age, this is the most common situ-
ation (Figure 2.2). Nevertheless, whether parents are divorced or not, with whom the children
live is a function of both the child’s age and his/her gender. Teenagers and adults are more
likely to live with people who are not their parents, mostly because of marriage or work.15
Age-related coresidence patterns, however, vary more subtly for children whose parents are
divorced. Among children whose parents are still together, very few live with their father but
not with their mother. This share increases only slightly with age. Among children of divorced
parents, the share of children who live with their father increases as children become older.
Second, daughters are less likely to live with their father than sons are (Figure A-2.1 in the
Appendix). Both findings are consistent with qualitative reports that suggest that fathers can
claim custody of their children once they turn 7 and that they more often live with their sons
than with their daughters. Young children who do not live with either of their parents are
usually fostered. A fostered child is a child who was sent to live with a host family (often to15Twenty-nine percent of all young women (15-25) and 4.5% of all young men are married. This share is lower
(but not significantly different) for children of divorced mothers: 26.6% and 1.2%, respectively, for daughters andsons.
2.3.1 Insights on divorces in Senegal 77
relatives: grandparents, an uncle or an aunt) by her parents (Marazyan, 2015). Fostering is
more common for children of divorced mothers (11% versus 6%) than for other children. This
difference remains significant even when controlling for the education of the mother.
78 Parental divorce and children’s educational outcomes in Senegal
Table 2.1: Characteristics of divorced women and of their children
(1) (2) (3) (4)Descriptive statistics Mean Mean Difference LPM
Panel A: Mothers Has divorced Never divorcedPre-divorce characteristicsAge 35.80 36.23 -0.44 0.00Highest education levelNo formal education 0.52 0.66 -0.14***Primary 0.31 0.20 0.11*** 0.04***Secondary or higher 0.15 0.09 0.06*** 0.04**Qur’anic 0.35 0.34 0.01 -0.01Ethnicity & religionMourid brotherhood 0.37 0.33 0.03 0.01Wolof 0.47 0.44 0.03 0.02*Serere 0.12 0.12 -0.01 0.01Poular 0.27 0.24 0.03 0.03**Father’s occupationInactivity of the father of the wife 0.13 0.14 -0.01Farmer 0.27 0.43 -0.16*** -0.02Independant or informal employee 0.26 0.21 0.05** 0.01State-employed or employer 0.24 0.15 0.09*** 0.03+Occupation unknown 0.09 0.07 0.02 0.07**Characteristics of the marriagePolygamy, first rank 0.10 0.18 -0.09*** -0.02**Polygamy, higher rank 0.25 0.18 0.06*** 0.03**Characteristics on the survey dateLives in rural area 0.37 0.57 -0.21***Household Consumption pcapFood expenditures (hh) 189259.97 164698.05 24561.93Other expenditures (hh) 263200.18 159468.74 103731.43**Family compositionMother lives with one of her parent 0.42 0.11 0.31***Number of children alive 3.21 3.43 -0.22Number of children (≤25 y.o) 2.76 3.03 -0.27**Number of children - last uniona 1.77 2.96 -1.19***
Number of womenb 290 3,952 4,242 3,927
Panel B: Children older than 7 Divorced Never divorced parentsAge 14.50 14.97 -0.46+Birth order 2.37 3.48 -1.11***Child is a girl 0.48 0.49 -0.02Child lives with mother 0.59 0.81 -0.22***Has been fostered 0.12 0.08 0.04***Attended primary school (≥ 7 y.o.) 0.72 0.65 0.07***Completed primary school (≥ 10 y.o.) 0.51 0.47 0.05+Attended secondary school (≥ 14 y.o.) 0.42 0.36 0.06*
Number of children 387 8,030 8,417
Note: The table presents characteristics of mothers and of their children according to whether they experi-enced a divorce. Panel A Mothers of at least a child younger than 25 on the survey date. Divorced mothersare women who have divorced from the father of at least one of her children younger than 25 on the surveydate. Other mothers appear in the “never divorced” group (that thus includes all still-married mothers butalso widows). Panel B All children older than 7. Education levels variables are defined only for childrenolder than an age threshold (mentioned in parentheses) that reflects how the Senegalese school system isorganized, so the composition of the sample changes for these variables. The number of observations re-ported (387 and 8,030) correspond to the number of children older than 7 on the survey date in each group.Specifications Column (1) reports the mean of each variable for mothers who have divorced and for childrenwhose parents divorced. Column (2) reports the mean of each variable listed for mothers who have not di-vorced and for children whose parents did not divorce. Column (3) reports the difference in means betweenthe group of mothers (children) who experienced a divorce and the group of mothers (children) who didnot (column (2) - column (1)). Significance levels from a t-test are also reported. Column (4) reports theresults of a linear probability model in which the dependent variable is a binary variable that takes on thevalue 1 when a woman has divorced and 0 otherwise. Significance levels are denoted as follows: + p<0.15,* p<0.10, ** p<0.05, *** p<0.01.a Number of children younger than 25 being born either into the current union (if not divorced) or into thelast union (if divorced). Half-siblings of children whose parents divorced are excluded from the sample.b The number of observations displayed corresponds to the number of mothers in each group. For somecharacteristics, the number of observations is lower than what is displayed, due to missing values on theQur’anic and polygamy variables. The number reported in column (4) corresponds to the number of womenfor whom the variables included in the model are not missing.
2.3.2 The Senegalese education system 79
2.3.2 The Senegalese education system
A dual education system In Senegal, schooling is mandatory for children aged 6 to 16.16 Chil-
dren must attend school within the formal school system. The formal school system is made
up of 4 educational blocks: preschool, primary school, secondary school (middle and high
schools), and higher education. Formal schools are referred to either as “French schools” or as
“French-Arabic schools”, depending on the language of instruction of the school. Most chil-
dren start attending school after they have turned 5 and before they turn 7, but some children
start attending school for the first time when they are older than 7 (Figure 2.3a). 17 Despite
the fact that schooling is mandatory from the age of 6, not all children attend primary school.
The low supply of schools in rural areas (Cisse et al., 2004) may explain why some children
start attending school at older ages than 6. Moreover, public schools do not charge school fees,
but there are additional monetary costs to attending school, such as transportation fees and the
costs of school supplies, that may prevent some families from sending their children to school.
Religious schools, known as Qur’anic schools or as daara in Wolof (Chehami, 2016), are com-
mon in Senegal, but are not part of the formal education system. While there exist a few
Qur’anic schools teach both a standard curriculum and a religious one, most focus almost
exclusively on religious education (Andre and Demonsant, 2014). The religious education sys-
tem and the formal education system are not necessarily exclusive: children can attend primary
school and a part-time Qur’anic school.18
Outcome variables We study three educational outcomes. The main variable of interest is
whether a child has any primary schooling. This outcome captures the first investment in
(formal) education that a child can receive. If a child attended a Qur’anic school but never at-
tended primary school, she is considered as having no primary education, since the curriculum
of Qur’anic schools is mainly religious. In our main specification, we consider this variable to
be defined for children who were older than 7 by the survey date. We check that our results are
robust to moving this age cutoff upwards, excluding children up to the age of 10.
The second variable of interest is whether a child (almost) completed primary school. We con-
16The law n° 2004-37, passed in December 2004, made primary schooling mandatory for children older than six.As the survey took place in 2011, children aged 6 at the time of the survey should hence have been enrolled inschool at the time the survey was conducted.
17Some children attend preschool, hence they appear as having attended formal school even when they areyounger than 6. Children who attend formal preschool usually go on to attend formal primary school.
18Using information on all children who are members of surveyed households, we find that 38% of childrenwho attended primary school also attended a Qur’anic school. 47% of children older than 7 who have never beenenrolled in primary school attended a Qur’anic school. Only 17.5% of children attended neither Qur’anic norprimary school.
80 Parental divorce and children’s educational outcomes in Senegal
sider that a child completed primary school if she attended fifth grade (CM1, the second to last
grade in primary school).19 It hence captures an increased investment in children’s education
than the first outcome variable does. A child who attends primary school will not necessarily
complete it: only 70% of children who have attended primary school complete it. Several fac-
tors explain why the completion rate is not higher. First, grade repetition is common in Senegal
(Ndaruhutse et al., 2008). In 2006, 12% of students had repeated at least a grade (Boubacar and
Francois, 2007). As the opportunity cost of schooling increases with age, repeating a grade is
likely to increase the likelihood that the child drops out of school. Second, the supply of schools
is even more limited for the higher grades of primary education: 36% of primary schools do
not offer the whole primary cycle (Boubacar and Francois, 2007). In our main specification,
we consider this variable to be defined for children aged 10 and older, as children who start
school at age 6 are supposed to be in fifth grade by age 10. Due to grade repetition, some chil-
dren complete primary school by age 13 or 14. As such, the outcome variable studied captures
whether a child has completed primary school at the time of the survey. It is likely that a few
children will complete primary school by an age greater than 10.
The third variable of interest is whether a child has (exclusively) attended Qur’anic school.
Attending a Qur’anic school was reported separately from the highest level of education for
children living in the surveyed households. For children living elsewhere, Qur’anic school
was only listed as a possible answer to the question regarding their highest education level, so
it is reported only for children who did not attend primary school.20 Hence, we cannot study
whether children attended Qur’anic school as a complement to formal schooling. Considering
other outcomes (such as transitions into secondary school) is not possible due to sample size
limitations.21
19For children who were not living in the surveyed household, the list of possible answers to the highest level ofeducation question pooled together the last two years of primary school (“5 or 6 years of primary school”), thus wecannot distinguish fifth grade completion from sixth grade completion.
20Enumerators were trained to report any formal schooling as the highest level of education, even if the child hadalso attended a Qur’anic school.
21A sibling fixed effects model estimated for secondary school attendance would be estimated on 23 identifyingobservations.
2.4 Methodology 81
Figure 2.3: Formal education by age and birth year
(a) Share of children who have any formalschooling.
0.2
.4.6
.8An
y fo
rmal
sch
oolin
g
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25Age on the survey date
Girls Boys
95% confidence intervals
(b) Share of individuals who have any formalschooling (corresponding ages: 10 to 70 y. o.).
0.2
.4.6
.8An
y fo
rmal
sch
oolin
g
1940
1945
1950
1955
1960
1965
1970
1975
1980
1985
1990
1995
2000
Birth year
Women Men
95% confidence intervals
2.4 Methodology
How are children affected by their parents’ divorce? Many confounding factors could explain
the differences found when comparing children whose parents have divorced and children
whose parents have not divorced. The method we use relies on sibling fixed effects, thus con-
trolling for any (potentially unobserved) factors that are common to siblings. These factors are,
for instance, the education level of the parents and of other family members, parental pref-
erences for education, the socioeconomic background of the family and its status within the
community, and family norms and rules, including the language spoken at home before the
divorce.
2.4.1 Empirical strategy
Primary school enrollment
Framework We consider the impact of parental divorce on children’s enrollment in primary
school. As children are supposed to start attending primary school the year they turn 6, we
consider children who were 5 or younger when their parents divorced to be those affected by
the divorce and children who were 6 or older when their parents divorced to be not affected
by the divorce in terms of this specific schooling outcome: whether a child has ever attended
primary school.22
22Reverse causality is unlikely to be at work here, given the chronology of the events considered. Children can beenrolled in primary school beginning at the age of 6, and children can be affected by divorce only when they are 5 oryounger by the divorce date. For affected children, the education decision could not have been implemented beforethe divorce decision was made. Conflict over education decisions (for instance, if parents have differing preferencesregarding their children’s education) may preexist, but its effects would be mediated by the decision to divorce (forinstance, if a divorce results in a change regarding who makes decisions about the children’s education).
82 Parental divorce and children’s educational outcomes in Senegal
In our analysis, we consider primary school and Qur’anic school to be substitutes, and we de-
fine the variables with respect to that idea: the Qur’anic school variable captures attendance
at a Qur’anic school for children who do not attend primary school. To compare both types of
educational choices, we study the likelihood that a child has (exclusively) attended Qur’anic
school using the same framework and specifications used when studying primary school en-
rollment.
Model 2.1: Basic LPM The first model is a linear probability model23 without fixed effects,
In this equation, i denotes children and s denotes a family (defined as a group of full siblings).
Standard errors are clustered at the family (sibling-group) level.
Variables The outcome variable, AnyPrimarySchool, is an indicator variable that takes on the
value 1 when a child has attended or is attending primary school and 0 otherwise. The main
variable of interest, AgeAtDivorce0/5, is an indicator variable that takes on the value 1 if the
child was 5 or younger when her parents divorced and 0 if her parents either divorced when she
was 6 or older or did not divorce. AgeAtDivorce6/25 is a binary variable that takes on the value
1 if the child was 6 or older when her parents divorced and 0 otherwise. Controls is a vector of
individual characteristics that includes the following variables: a binary variable that takes on
the value 1 if the child is a girl, quadratic controls for year of birth, and four binary variables
that account for birth order (birth orders higher than 4 are grouped together).24 We discuss the
correlations between each of these variables and the outcome variables in the Appendix of the
paper (Table A-2.1 and section A-2.1.). In an alternative specification, we add binary variables
for the highest education level of the mother as well as the interactions of those variables with
the variables included in Controls.
23We do not estimate logit models, as logit models with fixed effects estimate results only for groups in whichthere is variation in the outcome variable: many observations are lost, and this results in the control variables beingpoorly estimated. Therefore, to compare the results from the model without fixed effects with the results using thesibling fixed-effects model, we specify the first model as a linear probability model. The results using a logit areconsistent with the results estimated using the LPM without sibling fixed effects.
24The model includes only individual controls. Family size is not included in the controls, so higher birth ordersalso capture larger family sizes in the basic linear probability model. In the model with sibling fixed effects, familysize is captured by the fixed effects, as family size is the same for all full siblings. The coefficient on age at divorce,0-5, remains positive and significant (0.137) when we include family size in the basic linear probability model. Thecoefficients themselves on the birth order variables vary very little when adding family size to the basic linearprobability model.
2.4.1 Empirical strategy 83
Model 2.2: LPM with sibling fixed effects The second model is a linear probability model
with sibling fixed effects.25 Including sibling fixed effects is equivalent to controlling for (po-
tentially unobserved) factors that are common to all siblings. Estimates from this model should
hence be less biased than estimates from the basic LPM.
In this equation, i denotes children and s denotes a family (defined as a group of full siblings).
Standard errors are clustered at the family (sibling-group) level.
Variables The outcome variables, the variable AgeAtDivorce0/5, and the vector of variables
Controls are defined as in model 2.1. γs represents the sibling fixed effects. As sibling fixed
effects are included, the right-hand-side variables can only be estimated if they vary within
families. Thus,AgeAtDivorce6/25 cannot be estimated, as it is collinear withAgeAtDivorce0/5
once the fixed effects are included.
Sample The estimation sample is made up of children older than 7 who have at least one
full sibling who is also older than 7. These conditions on age at survey date and on family
composition are needed given that the outcome variable is defined only for children who were
older than 7 by the survey date and that the sibling fixed-effects model can only be estimated
for families with at least two children. We consider only full siblings in the analysis.26 To
compare the results from the two models, we estimate the basic linear probability model on the
same sample as the (more data intensive) sibling fixed-effects model.
Primary school completion
Studying primary school completion allows us to test whether divorce has longer-run conse-
quences for children’s educational outcomes. We modify the analysis framework described
above to take into account the higher age threshold (10 years old) associated with this variable.
The higher age threshold reduces the sample size, thus limiting the type of analyses that can
be conducted. The results for primary school completion should hence be interpreted with
caution.
25Such models have been widely used in the literature on the impact of divorce (Bjorklund and Sundstrom, 2006;Bratberg et al., 2014; Ermisch and Francesconi, 2001; Francesconi et al., 2010; Le Forner, 2020).
26The results do not change when we include half-siblings from the mother’s side to the sample (we use motherfixed effects instead of (full) sibling fixed effects).
84 Parental divorce and children’s educational outcomes in Senegal
Framework and sample We consider the impact of parental divorce on whether children
reached 5th grade in primary school (by the expected age). Since children who start primary
school at age 6 should be attending 5th grade by age 10, we consider children who were 9 or
younger when their parents divorced to be affected by the divorce and children who were 10
or older when their parents divorced to be not affected by the divorce when we study primary
school completion. Hence, the estimation sample is made up of children older than 10 who
have at least one full sibling who is also older than 10.
In this equation, i denotes children and s denotes a family (defined as a group of full sib-
lings). Standard errors are clustered at the family (sibling-group) level. The outcome vari-
able CompletedPrimary is an indicator variable that takes on the value 1 when a child has
completed primary school (reaching the fifth year of primary school out of six years) and 0
otherwise. The main variables of interest are the variable AgeAtDivorce0/5 and the variable
AgeAtDivorce6/9. Both are indicator variables that take on the value 1 if the child’s age at the
time of the divorce is within that variable’s specific age range and 0 otherwise. The vector of
variables Controls is defined as in model 2.1. γs represents the sibling fixed effects.
2.4.2 Identification and interpretation issues
Identifying assumption The SFE model identifies the causal effect of divorce under the as-
sumption that in the absence of divorce, children younger than 5 at the time of the divorce
would have had the same educational outcomes as their older siblings. For this assumption to
be credible, two conditions must be fulfilled. First, the timing of the divorce must be as good as
random. Second, the siblings must have, on average, the same potential educational outcome:
there must not be systematic differences in ability or endowments between siblings that are
dependent on birth order. We discuss these two points below.
Conditional on divorce occurring, is the date of the divorce a random event?
The marriage market literature uses the idea of “sympathy shocks” (Dupuy and Galichon, 2014)
that occur randomly and increase the quality of a match. Similarly, the quality of a match could
be decreased by a random “reverse sympathy shock”, leading to divorce. During interviews
2.4.2 Identification and interpretation issues 85
conducted in Senegal, divorced parents’ narratives supported this idea. 27 This assumption is
backed by the fact that variables that capture family composition are not correlated with the
divorce date. First, the (wide) distribution of the age-at-divorce variable (Figure 2.1) suggests
that parents do not strategically time their divorce with respect to their children’s ages.28 Sec-
ond, families for which the coefficient of interest is estimated—those with (at least) one child
younger than 5 at the time of the divorce and one older than 5—are not different from other
families that also experienced divorce. The mothers of children for whom the coefficient of in-
terest is estimated do not appear to be different from other divorced mothers, apart from their
structural demographic characteristics (because the age of the women, the number of their chil-
dren, their children’s age, and the length of marriage are all correlated). Detailed results can be
found in the Appendix (Table A-2.2 and section A-2.2.).
Balance test on children’s characteristics
To support the assumption that the divorce date is not correlated with the characteristics of
the children that could also determine their educational outcomes, we conduct a balance test
of children’s characteristics across age-at-divorce groups. This test helps us check that older
siblings’ educational outcomes can be a credible counterfactual for their younger siblings’ out-
comes. Table 2.2 reports the results of this balance test on the estimation sample (siblings older
than 7 whose parents divorced) for the available individual characteristics determined before
the divorce. These variables are also the ones used as controls in the regression models.
As expected from a sample made up of siblings, there are systematic differences among age-at-
divorce groups when considering birth order and year of birth. As all siblings experience their
parents’ divorce on the same date, children who were older than 6 on the divorce date were
also older than their (younger) siblings on the survey date; hence, there was a 4.8-year average
age difference between these two groups. Relatedly, children older than 6 at the time of the
divorce were more likely to be firstborn children and less likely to be thirdborn.29 We might
worry that older siblings receive educational endowments that are on average different from
those of their younger siblings and that older siblings’ educational outcomes would therefore
27From qualitative interviews conducted in Senegal, it seems that some people do indeed consider divorces toresult from a reverse sympathy shock that older generations used to endure due to mougne (a Wolof term thatdescribes an attitude of resignation that allows one to endure difficulties).
28Qualitative interviews conducted in Senegal indicate that an individual’s priors regarding the impact of divorcediffer greatly—and often seem correlated with their own experiences—so the patterns we observe might also resultfrom timing according to different priors (e.g., “it is better to divorce while the children are young, so they will notbe affected by the conflict” or “it is better to stay together till the children are teenagers or young adults”).
29Birth order is computed using all children born to the same mother and thus includes older siblings who wereolder than 25 and half-siblings. Few children whose parents divorced have older half-siblings (from the mother’sside), and their birth rank is not affected by whether they have younger half-siblings. As a robustness check, weestimate the model on a sample that includes half-siblings. The coefficient on divorce remains large and positive(results available in the Online Appendix).
86 Parental divorce and children’s educational outcomes in Senegal
not be a good counterfactual for their younger siblings’ outcomes. To capture these potential
systematic differences in education endowments, we add birth order fixed effects to the model,
thus ensuring that the age-at-divorce variable does not capture birth order effects.30 Parents
may also make different educational decisions for their children depending on time-varying
factors: social norms, expectations, and school-related costs at the time a child should enter
school. To capture these potential systematic differences in education endowments, the model
includes time trends (quadratic controls for year of birth).
Age and birth order are mechanically correlated across siblings, but this is not the case for
gender. Hence, if the date of divorce is random, then the share of girls should be the same when
considering the different age-at-divorce categories. We confirm that this is the case (Table 2.2).
The identification assumption also requires that children younger than 5 at the time of the
divorce have the same health outcomes and the same educational abilities as children older
than 6 at the time of the divorce, as health and ability could affect their educational outcomes.
However, there is no reason why these differences should be correlated with the age-at-divorce
category in the Senegalese context.31
Threats to identification The SFE strategy does not distinguish between the impact of a di-
vorce and the impact of any other time-variant factors that would cause younger children not
to have the same educational outcomes as their older siblings. Given the Senegalese context,
a time-variant factor that triggers a divorce and causes younger children to have better educa-
tional outcomes than their older siblings is the main threat to identification.
Positive shock triggering a divorce A positive income shock that allows a woman to divorce would
be a confounding factor. The income shock could allow the woman to divorce and thereby
increase the likelihood that the younger children are sent to school. However, potential positive
income shocks are either extremely rare (because few women inherit from their parents, this
type of windfall income is unlikely to occur often enough to explain a large share of divorces)
or are likely to affect women who would already have the means to divorce (women with a
formal job are likely to already be well enough off to divorce if they wish to do so).
30In the results presented in the paper, birth order fixed effects are estimated on the sample made up of bothsiblings whose parents divorced and siblings whose parents did not divorce. The model can also be estimated ona sample made up only of siblings whose parents divorced, as there are children of each birth rank in both age-at-divorce categories. In this subsample, the coefficient on divorce remains large and positive (results available in theOnline Appendix).
31If parents divorce because their youngest child has a disability and the stress on the family is too high, then theage-at-divorce variable would capture differences in ability across siblings. However, this scenario is unlikely inthe Senegalese context: to the best of our knowledge, having a disabled child has never been mentioned as a causeof divorce, either in our own qualitative fieldwork or in research done by others (Dial, 2008).
2.4.2 Identification and interpretation issues 87
Table 2.2: Balance test of characteristics according to children’s age at the time of divorce
(1) (2) (3)Affected Not affected
≤ 5 at divorce ≥ 6 at divorceMean Mean Difference
Child is a girl 0.42 0.47 0.05Birth year 1999.20 1994.40 -4.81***First child 0.15 0.34 0.18***Second child 0.36 0.26 -0.09Third child 0.23 0.15 -0.08*Fourth and more 0.26 0.25 -0.00N 98 167 265
Note: The table presents characteristics of children according to their age atthe time of divorce.Sample: Children of divorced parents who are older than 7 on the survey date,only when they belong to a family in which two children are 7 or older on thesurvey date.Specification Column (1) reports the mean of each variable for children whowere younger than 5 when their parents divorced. Column (2) reports themean of each variable for children who were older than 6 when their parentsdivorced. Column (3) reports the value of the difference in means (column (2)- column (1)) and the significance of the t-test of the difference. P-values aredenoted as follows: + p<0.15, * p<0.10, ** p<0.05, *** p<0.0
Negative shock triggering a divorce
Short-term adverse circumstances could trigger divorce and decrease the likelihood that chil-
dren who should attend school do so. When the situation improves, the younger siblings are
then more likely to attend school than the older children are. We cannot directly test this hy-
pothesis, as the data include information on only the latest shock experienced by households.
What we do is to compare children who could have been affected by this potential negative
shock (children aged 6 to 9 at the time of the divorce) to their older siblings (children aged 10
and older at the time of the divorce) who are arguably less likely to have been affected by the
negative shock (it would need to have taken place 5 years prior to the divorce). The results
show that children aged between 6 and 9 when their parents divorced are as likely to have
attended primary school as their older siblings, which indicates that the negative shock story
is unlikely to explain our results.
Exposure to conflict The occurrence of conflict is often correlated with divorce, but conflict might
take place both before and after the divorce.
The impact of conflict is likely to vary depending on children’s age and on the length of the
conflict, making it difficult clearly predict what the impact of conflict could be: older children
might be more affected, children may get used to conflict, conflict might worsen or recede after
the divorce, etc. Moreover, we expect that high conflict levels could impact children’s well-
88 Parental divorce and children’s educational outcomes in Senegal
being and some educational outcomes, such as test scores, but not necessarily the decision to
send a child to school, especially as children can be registered in primary school even if they
are older than 7. We hence do not consider conflict to be a threat to the sibling fixed effects
identification used.
2.5 Results
2.5.1 Results: Ever attended primary school
Main results Columns (1) and (2) of Table 2.3 report the results of two basic linear probability
models. Column (1) reports the results of a regression for whether the child has ever attended
primary school on indicator variables for the age-at-divorce groups. Being 5 or younger at the
time of the divorce is associated with a higher likelihood of having attended primary school.
When controlling for birth year, birth order, gender, and their interactions with the level of edu-
cation of the mother (column 2), the magnitude of the coefficient decreases, but it remains pos-
itive and significant. On average, children who were 5 or younger when their parents divorced
were 11 percentage points more likely to have attended primary school than their counterparts
whose parents did not divorce: this difference represents a 16% increase in the share of children
who have ever attended school.
Columns (3) to (5) report the results of three sibling fixed-effects models. Being 5 or younger
at the time of the divorce is still associated with a higher likelihood of having attended school
(column 3). The inclusion of controls does not change the results much (column 4); if anything,
the coefficient increases when the controls are added. The addition of the sibling fixed effects
slightly increases the magnitude of the coefficient of interest (column 2 vs. column 4). The
sibling fixed effects capture some unobserved characteristics that are common to siblings: the
basic LPM estimates seem to be slightly downward biased compared to the results estimated
with the SFE model.32
Column (6) reports the results of a regression for whether the child has (exclusively) attended
a Qur’anic school on indicators for the age-at-divorce groups and sibling fixed effects (the
same specification as in column 4). The estimated coefficient is -0.145: children who were
younger than 5 years old at the time of the divorce were less likely to have excusively attended
32Introducing sibling fixed effects changes the observations on which the coefficient of interest is identified: theidentifying families in the LPMs include families in which all the children were younger than 5 at the time of thedivorce while the identifying families in the SFE model include only families in which there is at least one childyounger than 5 at the time of the divorce and one older than 6. We reestimate the LPM on a restricted samplethat excludes families in which all children were younger than 5 at the time of the divorce so that the effects of theage-at-divorce groups are estimated on the same identifying families as in the SFE model. The coefficient of interestin the LPM is 0.131, which is still markedly different from the SFE coefficient.
2.5.1 Results: Ever attended primary school 89
a Qur’anic school than their older siblings.33 This finding indicates that the older siblings of
affected children are more likely, on average, to have attended Qur’anic school than no school
at all.
Our findings indicate that parental divorce does not necessarily lead to worse schooling out-
comes for children who were young at the time of divorce. It seems that parents might even
be able to (excessively) compensate their younger children after a divorce. In the remainder of
this paper, we test the robustness of these results and investigate which channels could mediate
them.
Table 2.3: Effect of parental divorce on primary school attendance and completion
(1) (2) (3) (4) (5) (6) (7) (8)
Specification LPM LPM SFE SFE SFE SFE SFE SFESample At least 2 children, older than 7 years old ≥2 children, ≥ 10Dependent variable Ever attended primary school Qur’anic only Ever attended CompletedAge at divorce0-5 y.o. 0.147*** 0.110** 0.139** 0.164*** 0.143** -0.145*** 0.164 0.0297
Note: Models (1) and (2) : Linear probability models. Models (3) to (8): Linear probability models with sibling fixed effects. Columns (1) to (5) and column (7): The outcome variableis an indicator variable that takes on the value 1 if the child has attended primary school and 0 otherwise. Column (6): The outcome variable is an indicator variable that takes on thevalue 1 if the child has attended (exclusively) Qur’anic school and 0 otherwise. Column (8): The outcome variable is an indicator variable that takes on the value 1 if the child hascompleted primary school and 0 otherwise.AgeAtDivorce0/5 is an indicator variable that takes on the value 1 if the child was 5 or younger when her parents divorced and 0 if her parents either divorced when she was 6 orolder or did not divorce. AgeAtDivorce6/25 is an indicator variables that takes on the value 1 if the child was 6 or older at time of divorce and 0 if either the child was youngerthan 5 or if her parents are not divorced.Control variables include: quadratic control for year of birth, birth order indicators (4 categories) and an indicator variable that takes on the value 1 if the child is a girl and 0 if thechild is a boy. C × educ variables include: the interaction of all the control variables with the highest education level of the mother (coded into 4 categories: no education, primary,secondary and higher, and unknown).At least 2 children, older than 7 years old Children who are older than 7 on the survey date, only when they belong to a family in which two children are 7 or older on the survey date. Atleast 2 children, older than 10 years old Children who are older than 10 on the survey date, only when they belong to a family in which two children are 10 or older on the survey date.
a p-value of the joint significance of the coefficient on AgeAtDivorce0/5 and the coefficient on its interaction with the variable girl.Robust standard errors in parentheses (clustered at the family (sibling-group) level). Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
Source: PSF2.
Heterogeneity: Gender and age at divorce As girls are on average more likely to attend
primary school than boys, we test whether girls’ educational outcomes are more affected by
a divorce before the age of 5 than boys’ educational outcomes are. The coefficient associated
33We might be worried that older children were more likely to not live with their mother than their youngersiblings and that as a result, their schooling outcomes are not accurately reported and, more specifically, that theoutcome is reported as a Qur’anic or no education even though they did attend primary school. When estimatingthe sibling fixed-effects specification with “living in the surveyed household” as a dependent variable, the coeffi-cient associated with age at divorce, 0-5, is -0.0365 (s.e. 0.0758), thus indicating that children who were younger atthe time of divorce were not more likely to live in the surveyed households (with their mother). As such, the effectis not driven by reporting bias correlated with where the child lived on the survey date.
90 Parental divorce and children’s educational outcomes in Senegal
with the interaction term between age at divorce and gender is close to 0 and not significant in
either the basic LPM (not reported) or the SFE models (column (5)): girls and boys who were
younger than 5 on the divorce date do not seem to be differentially affected by their parents’
divorce when considering primary schooling. Given that the number of identifying children
and families in this specification is only slightly lower than in the main specification, it seems
that this insignificant result is not due to low power.
To understand whether the results are driven by specific ages at divorce, we estimate models
that include binary variables for each age at divorce (from 0 to 6 years old at the time of the
divorce) instead of two age-at-divorce categories (5 or younger and 6 or older). 34 The results
do not seem to be driven by a specific age group for children aged between 1 and 5 at the time
of the divorce. The coefficients associated with these age-at-divorce variables are displayed in
Figure A-2.2 in the Appendix.
Could the results be driven by a negative shock that lowered older children’s likelihood of
attending school? If a (negative) economic shock triggers (most) divorces, then we would
expect that divorce occurs rather shortly after the shock. For children who are older than 10
at the time of the divorce, the shock would need to have happened at least 5 years before the
divorce to affect their enrollment in primary school. As such, children older than 10 at the
time of the divorce are less likely to have been affected by a potential negative shock that also
triggers a divorce than children younger than 10 at the time of the divorce. We hence use
children older than 10 at the time of the divorce as the control group (not affected children) in
the main specification (column 7). The results show that children who were between 6 and
9 years old at the time of the divorce were as likely to have attended primary school as their
older siblings. We consider only children older than 10 on the survey date, so the sample size
is reduced, and the power is likely to be too low for us to detect significant effects. However,
the magnitude of the coefficient on the age at divorce, 6-9, variable (0.02) is much lower than
the magnitude of the coefficient on the age at divorce, 0-5, variable (0.164, also insignificant), so
the result for ages 6-9 at the time of the divorce can credibly be interpreted as an indication that
children aged 6-9 at the time of the divorce have the same schooling outcomes as their older
siblings.
We then estimate models that include binary variables for each age at divorce (from 0 to 9 years
old at the time of the divorce). Children aged 10 or older at the time of the divorce are hence the
control group, as discussed above. Figure 2.4 displays the coefficients associated with each age-
34The effects are hence identified using a larger set of observations/families, as we can leverage variation withinthe age group 0-5.
2.5.2 Results: Completed primary 91
at-divorce variable. The coefficients estimated on all the variables for age at divorce between
6 and 9 are close to 0: the children whose parents divorced when they were between 6 and
9 have the same likelihood of attending primary school as children who were older than 10
when their parents divorced. Similarly, the coefficient associated with being between 6 and 25
at the time of the divorce is not significant in columns (1) and (2): the probability of attending
primary school for children older than 6 at the time of the divorce does not differ from that of
children whose parents did not divorce. Thus, the coefficient on being younger than 5 at the
time of divorce comes from children younger than 5 on the divorce date being more educated
than what would be expected, rather than from children older than 6 at the time of the divorce
being less educated than what would be expected.
Figure 2.4: Coefficients associated with age-at-divorce variables
-.50
.51
1 2 3 4 5 6 7 8 9Age at divorce
1 : LPM without controls 2: SFE without controls3: SFE with controls
Note: Coefficients associated with binary variables for age at the time of the divorce. The omitted category cor-responds to children older than 10 at the time of the divorce. Specification: The dependent variable is an indicatorvariable that takes on the value 1 if the child has attended primary school and 0 otherwise. The model is a linearprobability model with controls (similar to column (7), Table 2.3). Sample: Children older than 10 on the survey datewho belong to families with at least 2 children older than 10 on the survey date.
2.5.2 Results: Completed primary
Column (8) in Table 2.3 reports the results for whether a child has completed primary school
(5th or 6th grade). Being younger than 5 at the time of the divorce is not associated with a
higher likelihood of completing primary school: the magnitude of the coefficient is very low.
This finding indicates that gains in primary school enrollment do not result in a higher likeli-
hood of having completed primary school, possibly because the level of investment in educa-
92 Parental divorce and children’s educational outcomes in Senegal
tion required to complete primary school is much higher than that required to have attended
primary school. Children who were between 6 and 9 when their parents divorced were as
likely as their older siblings to have completed primary school (on time). It seems that a di-
vorce during primary school does not affect children’s likelihood of completing 5th grade on
time. This finding means that children are not more likely to drop out of primary school than
their older siblings after their parents divorce.35
2.5.3 Sensitivity checks
Table 2.4 reports the results of the main specification when both the age threshold for inclusion
in the sample (columns) and the age threshold to be considered affected by the divorce (rows)
vary.
Sensitivity to the definition of the sample (age on the survey date) As the main sample is
made up of all children older than 7 on the survey date, the main outcome is mismeasured
for some of the children who had not (yet) begun attending primary school on the survey date
but who would start attending it. To check that measurement error does not drive the results,
we vary the age at which we define primary school attendance as beginning, going in steps
from 7 to 10, the age at which we see no more new entries into primary school (Figure 2.3a).
The magnitude of the coefficients (keeping constant the definition of being affected by divorce)
remains similar across columns. The main effect (age at divorce, 0-5) remains positive and
significant when we use ages on the survey date that are less than 10, indicating that our results
are unlikely to be much affected by measurement error. The fact that the significance of the
coefficient decreases as the age threshold increases seems more likely to result from a loss of
power due to the reduced sample size and the related decrease in the number of identifying
families.
Sensitivity to the definition of the affected by divorce variable (age at the time of divorce)
We also check that the results are robust to varying the upper age limit for the affected by di-
vorce group (this also changes the lower age limit of the not affected group). As expected from
Figure 2.4, the coefficients are positive and significant for age-at-divorce groups 0-5 and 0-6 and
not significant for other age thresholds. This is due to inclusion and exclusion errors in the af-35As a balance test, we confirm that the share of girls is not significantly different across the age-at-divorce groups
(0-5, 6-9, 10 or older). Using a sample of children who were older than 11 on the survey date, we find that thecoefficients on age at divorce, 6-9, and age at divorce, 6-10, remain close to 0 and not significant. Using a sampleof children who were older than 12 on the survey date, we find that the coefficient on age at divorce, 6-9, remainsclose to 0 and not significant. However, the coefficients on age at divorce, 6-10, and age at divorce, 6-11, are negativeand significant. Due to the low number of identifying observations and to potential measurement error (delayedcompletion of primary school), these results must be interpreted with caution.
2.6 Channels 93
fected group: the coefficients on the age-at-divorce groups 0-4, 0-7, and 0-8 are hence estimated
on siblings for whom we expect no difference in primary school attendance. The age threshold
4 results in errors of exclusion from the affected group (the not affected group includes children
aged 5): the coefficient on the age-at-divorce group 0-4 remains positive, but its magnitude de-
creases and its standard errors increase. Age thresholds 7 and 8 result in errors of inclusion in
the affected group (which then includes children aged 7 or 8): the magnitude of the coefficients
on the age-at-divorce groups 0-7 and 0-8 drops.
Table 2.4: Sensitivity to the definition of the sample (columns) and to the definition of beingaffected by divorce (rows)
Note: Columns show results when varying the age threshold for inclusion in the sample. Rows showresults when varying the age threshold to be considered affected by divorce.Cells Each cell reports the coefficient on the variable AgeAtDivorce of a regression using the sibling fixedeffects model. The outcome variable is an indicator variable which takes on the value 1 if the childhas attended or attends formal (primary or secondary) school and 0 otherwise.Control variables include:quadratic control for year of birth, birth order indicators (4 categories) and an indicator variable thattakes on the value 1 if the child is a girl and 0 if the child is a boy.Robust standard errors in parentheses (clustered at the family (sibling-group) level).Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
Source: PSF2.
2.6 Channels
In this section, we discuss what might explain the results for primary school attendance and
primary school completion. We first test channels that could affect children directly: custody
and fostering decisions. As most children live with their mothers, especially when they are
94 Parental divorce and children’s educational outcomes in Senegal
young, we next consider how a divorce may affect mothers and discuss their access to re-
sources, remarriage and decision-making power.
2.6.1 Children: Custody and fostering decisions after divorce
We discuss whether with whom the child lives after divorce matters for educational outcomes.
If parents have differing preferences regarding their child’s education, then the child’s educa-
tional outcomes depend on which parent has custody. If parents foster their child, then they
are likely to foster her to relatives who have the same preferences for education as they have
but who have access to more resources.
We consider two variables that capture with whom a child was living before the age of 7 (the
age at which she should start attending school): living with her father (but not with her mother)
and being fostered. These variables allow us to study the link between custody and fostering
decisions and primary school attendance. These estimates should be interpreted as correla-
tions: custody and fostering decisions are also determined by unobservable characteristics of
the children (for instance, personality) that may also influence educational outcomes. Addi-
tionally, for custody and fostering decisions to mediate the positive link between being younger
than 5 on the survey date and increased school attendance, we would need to know with whom
the child was living during the period between her parents’ divorce and the age of 7. However,
we cannot reconstruct this variable for the whole sample and hence consider custody and fos-
tering decisions made before the age of 7.36
Our analysis follows three successive steps. First, we check whether children whose parents
divorced when they were 5 or younger had different caregivers from those their older siblings
had (Table A-2.3 in Appendix). As expected, children who were younger than 5 years old at
the time of the divorce were less likely to have lived full time with their mother (before 7) than
their older siblings were, but they were 9.9 percentage points more likely to have lived with
their father and 13.9 percentage points more likely to have been fostered before the age of 7
than their older siblings were.
Second, we check whether having lived with one’s father and whether having been fostered are
associated with a higher likelihood of attending primary school (columns 1 and 2, Table 2.5).
36Recovering retrospective information on who had custody of a child before the age of 7 is already a strenuouseffort, and we are not able to credibly recover this information for all children. However, we conduct two checks toensure the validity of these variables and of the analysis. First, we are able to build a variable that captures custodyand fostering decisions made between the divorce date and the age of 7 for a subsample of children. The correlationbetween this variable (“after divorce, before 7”) and the variable used in the analysis (“before 7”) is 0.8. Second,as the variable creation process results in additional missing values, thus restricting the sample, we estimate theregressions from Table 2.3 on this sample and confirm that the results do not change.
2.6.2 Mothers: Financial resources, remarriage and decision-making power 95
Having lived with the father before age 7 has no significant effect. Having been fostered is neg-
atively correlated with the likelihood of having attended primary school (column 1). However,
this effect is not necessarily causal. Fostering may be a means for parents to invest in their chil-
dren’s education, for instance, if they cannot finance it: the likelihood of attending school may
have been even lower if these children had not been fostered. This correlation disappears when
introducing the fixed effects (column 2): it is in fact driven by family-level characteristics.37
Third, we add the age-at-divorce variable and its interaction with the custody and fostering
variables to the regression model (columns 3 and 4, Table 2.5). The coefficient on age at divorce
remains significant, and the interaction terms are never significant. When considering whether
a child has lived (exclusively) with her mother before the age of 7, we find a positive correlation
with this variable (as expected from the signs of the coefficients on living with one’s father
and on fostering). However, the interaction term with the age at divorce is negative and not
significant. These findings suggest that custody and fostering decisions are not what mediate
the results. These results should nevertheless be interpreted with caution, as the coefficients
are identified on a small number of observations: the null result could therefore also be due to
a lack of power.
2.6.2 Mothers: Financial resources, remarriage and decision-making power
We discuss how mothers’ characteristics could affect children’s educational outcomes after a
divorce. These characteristics are endogenously determined, but studying them can shed light
on what happens to families after a divorce.
Financial resources The first potential channel is that a divorce may increase access to re-
sources, either permanently or temporarily, for some women. If getting a divorce allows
women to be more financially independent (Dial, 2008), then divorced women would have
a better situation than before, as they have more control over their resources. Similarly, women
who divorce a man who did not contribute to household expenses would be in a better finan-
cial situation after the divorce than before. However, this (permanently) improved financial
situation does not seem consistent with the fact that the results for primary school attendance
do not carry through to primary school completion. A more temporary shock would be more
in line with the patterns observed. Such shocks could include women benefiting from addi-
tional transfers from their family network after their divorce. More specifically, mothers may
receive more transfers from their siblings that are intended to help with their children’s edu-
37Beck et al. (2015) (using the same dataset as we do) find that fostered children have the same educational out-comes as their host siblings.
96 Parental divorce and children’s educational outcomes in Senegal
Table 2.5: Heterogeneity of effects on attendance: custody and fostering decisions
(1) (2) (3) (4)LPM SFE LPM SFE
Panel A: Interaction with variable living with fatherDependent variable Ever attended primary schoolAge at divorce0-5 y.o. 0.177*** 0.164**
(0.0532) (0.0644)With father -0.00128 -0.0565 -0.0102 -0.0642*
(0.0340) (0.0351) (0.0350) (0.0349)With father × 0-5 0.0297 0.0725
Note: Columns (1) and (3) present basic linear probability models, and columns (2) and (4)present linear probability models with sibling fixed-effects on the main sample. The outcomevariable is an indicator variable that takes on the value 1 if the child has attended primaryschool and 0 otherwise.Sample, Panel A: Results are estimated for a subsample of the main sample. This subsample ismade up of children from whom the variable having lived with the father is not missing and whobelong to families with at least two children for whom this variable is not missing.Sample, Panel B: Results are estimated for a subsample of the main sample. This subsample ismade up of children for whom the variable fostered is not missing and who belong to familieswith at least two children from whom this variable is not missing.
a Share of children who were living with their father, but not with their mother, before the ageof 7 among children of divorced parents.
b Share of children who were fostered before the age of 7 among children of divorced parents.Control variables include: quadratic control for year of birth, birth order indicators (4 cate-gories) and an indicator variable that takes on the value 1 if the child is a girl and 0 if the childis a boy.Robust standard errors in parentheses (clustered at the family (sibling-group) level).Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
Source: PSF2.
cation (such as the transfers—not specific to the case of divorce—highlighted by Baland et al.
(2016) in the Cameroonian context). This support may allow recently divorced women to send
their young children to school by helping them pay for educational expenses, such as transport
fees and school supplies. This support from the family network might fade away in the longer
run.
We cannot directly assess the short- and long-run impacts of divorce on women’s income and
resources, as we do not have retrospective data on the economic situation. However, the panel
dimension of the PSF survey allows us to describe how per capita household consumption
changes after a divorce. We use data on the 43 women who divorced the father of their children
between the two survey waves. For these women, annual per capita household consumption
2.6.2 Mothers: Financial resources, remarriage and decision-making power 97
seems to be rather stable (397,455 FCFA per year per capita in 2011 versus 388,034 in 2006). It
seems that, on average, these women did not experience dramatic changes in their economic
situation. These estimates are compatible with a short-term positive economic shock after a
divorce and then a return to average consumption levels.
Resources: Remarriage Remarriage may be a means for women to access financial resources.
We investigate whether the positive link between divorce and primary school attendance is
driven by women who have remarried (columns 2 and 3, Table 2.6). The coefficient on the
interaction term between a child being younger than 5 at the time of the divorce and the re-
marriage of the child’s mother before the child turned 7 is not significant, but its size (0.209)
is relatively large compared to that of the coefficient on divorce before age 5 (0.118). The co-
efficient associated with being younger than 5 at the time of the divorce is halved and is not
significant when the interaction with remarriage is added. The positive link between divorce
and primary school attendance therefore seems to be driven by women who have remarried
(shortly) after their divorce. Remarriage might allow a woman to allocate more resources to
her children’s education given the scale economies associated with marriage as well as the po-
tential direct contribution of her new husband. It could be that her new husband helps with
the financial expenses, even if the child does not live with him.
These results should not be interpreted as a pure impact of remarriage, as selection effects are
also at work. Remarried women are on average younger than other ever-divorced women and
are hence likely to have better opportunities in the labor market.38 When investigating the
heterogeneity in the effect of divorce on primary school attendance according to the age of the
mother (columns 4 and 5, Table 2.6), we find that the effects are driven by mothers who are
younger than the median age at the time of the divorce (29 years old).
We cannot further disentangle whether this effect is due to remarriage or to (young) age, as
both are strongly correlated. Additionally, the decision to remarry is endogenous to other char-
acteristics of women, as some women choose not to remarry. Notably, women who cannot
afford remarriage tend to remain single (Lambert et al., 2019). We hence expect remarriage to
have a positive impact among women who face difficult financial situations, but we do not ex-
pect differences between women according to whether they remarried once their income and
access to resources is controlled for.
38Women who remarried in our sample were on average 27 years old at the time of the divorce, whereas womenwho did not were on average 30 years old. The difference is statistically significant.
98 Parental divorce and children’s educational outcomes in Senegal
Table 2.6: Heterogeneity of effects on attendance: Remarriage, age, and education of mothers
Note: Columns (1), (3), (5) and (7) present linear probability models with sibling fixed effects on the main sample. Columns (2), (4) and (6) present basic linearprobability models. The outcome variable is an indicator variable that takes on the value 1 if the child has attended primary school and 0 otherwise. Results of thecolumn (1) are different from the column (4) in Table 2.3 because the sample is reduced here to children of divorced women.
a Results are estimated on children who are older than 7 on the survey date and whose mother has divorced, only when they belong to a family in which two childrenare 7 or older on the survey date.Control variables include: quadratic control for year of birth, birth order indicators (4 categories) and an indicator variable that takes on the value 1 if the child is a girland 0 if the child is a boy.Robust standard errors in parentheses (clustered at the family (sibling-group) level).Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
Source: PSF2.
Decision-making power and preferences After a divorce, a woman may gain more bargain-
ing power or become the sole decision maker regarding her children. If women have a stronger
preference than their husbands for their children’s primary education, this may explain why
children who are younger at the time of the divorce have better educational outcomes than
their older siblings. The link between women’s bargaining power, their stronger preference
for their children’s education, and their investment in the education of their children has been
highlighted in several societies (Doss (2013) for a literature review, Menon et al. (2014) on Viet-
nam), though the evidence has been contested in others (Akresh et al. (2016) do not find such
a correlation in the Burkinabe context). The role of this potential channel cannot be directly
tested, as the PSF survey contains no information on parental preferences for their children’s
education.
Education of the mother Higher educational attainment among women is likely to be pos-
itively correlated with those women’s access to resources, their bargaining power within the
household, their preference for education, but also to higher household-level consumption and
income. This correlation is likely to hold across all education levels, but as few mothers have
completed more than primary school, we are only able to use two categories: educated mothers
2.6.2 Mothers: Financial resources, remarriage and decision-making power 99
and uneducated mothers.
On the one hand, we expect all children of educated parents to have attended primary school—
the income constraint should not be binding enough for such parents not to send their child
to school. Additionally, educated parents are likely to have stronger preferences for their chil-
dren’s education. When parents have a (relatively) high income and strong preferences for
children’s education, then all their children should be attending primary school. If all siblings
attend school then there cannot be a difference in the likelihood to have attended primary
school across siblings. If that is the case, the positive effect of divorce on education should
be mostly driven by uneducated women. On the other hand, we expect the (relative) decision-
making power channel to work similarly for both educated and uneducated mothers. Actually,
educated mothers are likely to have higher bargaining power in absolute terms, but the effec-
tiveness of this channel depends on the bargaining power a woman has relative to her husband,
so both educated and uneducated women could gain more decision-making power after they
divorce.
We hence test whether there are differences in the impact of divorce on the primary school
attendance of the child depending on whether her mother has attended primary school (or
higher levels of education) (columns 6 and 7, Table 2.6). The interaction between having an
educated mother and age at the time of the divorce is negative but not significant. As there is
no significant difference between both categories of mothers, the result could be interpreted as
evidence that a (relative) decision-making power channel is indeed at work (for all divorced
women) and that the relaxing of the income constraint after a divorce is not what drives the
results. However, the negative sign of the coefficient would be consistent with the positive
effect of parental divorce on education being driven more by children of uneducated women
than by children of educated women, and therefore with a relaxing of the budget constraint
after a divorce for uneducated women. It might be that only the mothers who have at least
completed primary school are financially secure enough not to face a binding income constraint
in regard to their children’s education. If that is the case, then we should group mothers who
only attended primary school with uneducated mothers and compare them to mothers who
attended secondary school in order to test the relaxing of the income constraint channel, but
the small number of women who attended secondary school does not allow us to do so.
Overall, these findings suggest that the positive effects of divorce on children younger than
5 at the time of the divorce are driven, for the most part, by children whose mothers have
remarried. Remarrying is likely to go hand in hand with an increase in resources, especially
100 Parental divorce and children’s educational outcomes in Senegal
in the Senegalese context in which female labor force participation is low. However, if the
positive effects we observe are driven only by a relaxation of the budget constraint, we expect
the effect to be driven by uneducated mothers, for whom the budget constraint is likely to be
much more binding than for their educated counterparts. This is not the case: the effects are
observed for both educated and uneducated women. All women are likely to gain bargaining
power regarding their children’s education in the case of a divorce. If women have a higher
preference for education than their former husbands, this channel would explain part of the
observed increase in children’s education.
Why is there no longer-term effect? The absence of results regarding (on time) completion
of primary education means that in the long run, divorced parents do not make an increased
investment in younger children’s education. In the case of remarriage, this may be explained
by the fact that the new husbands do not value supporting the education of a child who is not
theirs over a long period. Moreover, the monetary and opportunity costs of schooling are lower
for younger children than for older children. For the latter, school supplies and transportation
are likely to be more expensive. Instead of attending school, older children can also contribute
to the household’s income and welfare by working or by helping the adults at home. All the
channels considered as explanations for the increased attendance in primary school are likely
not to be strong enough to overcome these increased direct and indirect costs to investing in
children’s human capital.
2.7 Conclusion
This paper offers new evidence on the consequences of parental divorce for children in Africa.
Using a dataset that records the detailed life histories of Senegalese individuals and of their
family members, we investigate how children’s educational outcomes are affected by their par-
ents’ divorce. Unlike previous studies, we show that divorce is not always detrimental to
children’s educational outcomes. Using a sibling fixed-effects model that allows us to control
for all characteristics common to the siblings, we find that children younger than 5 years old
at the time of the divorce have a higher likelihood of attending school than their siblings who
were older than 6 at the time of the divorce. This positive effect is not driven by children who
were older than 6 at the time of the divorce being negatively affected by the divorce but by an
increased investment in younger children’s education after the divorce. This higher investment
is not sustained for long: these children are as likely as their older siblings to have completed
primary school.
2.7 Conclusion 101
These positive findings are likely explained by selection rather than by institutions that are
specific to the Senegalese context. We find no evidence that fostering—an institution that is
common in most of West Africa—is what mediates the effects, indicating that selection into
divorce and selection into remarriage are likely to drive the results. The positive selection into
divorce based on education indicates that for most women, high levels of income, support
from the extended family, and bargaining power may be needed to decide to divorce. As there
is selection into divorce, we cannot infer from our results what the effects of divorce would
be for children if the selection into divorce changes, for instance, following a legal reform of
child support and alimony or changes in social norms. Our findings still nuance the concerns
raised by some in Senegal over the consequences of parental divorce for children, and further
research on what drives the effects we observe could help policymakers design policies that
better support divorced parents.
Future research on Senegal could aim to better understand what mediates the effects observed
and to understand why there are positive effects on primary school attendance but no effect
on primary school completion. First, the ideal data would include information collected be-
fore the divorce as well as after. Second, relying on better measures of human capital and
school performance would allow for a more complete picture of the effects of divorce on chil-
dren’s education. Data collection could then focus on test scores, cognitive and noncognitive
skills and psychological well-being. Third, a more nuanced understanding of divorce would re-
quire high-frequency panel data that tracks families’ income and expenditures as well as family
composition and especially where and with whom children are staying and who is their main
caretaker. These data could enable the identification of factors that help families compensate
their younger children after a divorce. If these factors are income-related, these findings could
provide leverage to improve the situation of divorced women with poor access to resources,
especially if selection into divorce changes and more women who are not from better-off back-
grounds divorce.
102 Appendix
Appendix
A-2.1. Individual determinants of educational outcomes
Table A-2.1 reports the results from three linear probability models in which the outcome vari-
ables are attendance in primary school, attendance in Qur’anic school (exclusively), and com-
pletion of primary school. The independent variables are the variables used as controls in our
main specifications (Table 2.3): gender, birth year, and birth order. To be able to comment on
birth order, we control for family size (family size is captured by the sibling fixed effects in
our main specification). For comparison purposes, we use the same sample as in Table 2.3: it
is made up of children who belong to families in which the outcome variable is defined for at
least two children.
Column (1) presents the correlation between the control variables and the likelihood of having
attended primary school. Girls are more likely to have attended primary school than boys.
This is consistent with the fact that the education rate of girls converged with that of boys in
the 1990s in Senegal (Figure 2.3b). This difference is also found when using the Demographic
Health Survey for Senegal 2010-2011: among children aged 7 to 15, 66% of girls and 63% of
boys had some primary schooling or were attending primary school on the survey date.
The trends observed in Figures 2.3a and 2.3b are seen in the regression results. The trend in
birth year is positive and significant, thus capturing increases in attendance rates over time:
children in later-born cohorts are more likely to have attended primary school than children
in earlier-born cohorts. Column (2) presents the correlation between the control variables and
the likelihood of having attended only Qur’anic school. Boys are more likely than girls to
have attended only a Qur’anic school, a finding that is consistent with qualitative evidence on
Qur’anic schools (Chehami, 2016).
There are no time trends when we consider Qur’anic school attendance. Column (3) presents
the correlation between the control variables and the likelihood of having completed primary
school by age 10. Boys and girls are equally likely to complete primary school. The time trends
in completion of primary school are similar to (and, if anything, stronger than) those observed
for primary school attendance: children in later-born cohorts are more likely to have completed
primary school than children in earlier-born cohorts. Controlling for family size, children with
a higher birth rank are less likely to have completed primary school than first-born children.
Appendix 103
Table A-2.1: Correlations between individual characteristics and school attendance
(1) (2) (3)
Dependent variable Ever attended primary school Qur’anic Completed primary SchoolSample: children older than 7 7 10
Child is a girl 0.0342*** -0.0920*** 0.00729(0.0118) (0.0101) (0.0134)
(872.2) (715.8) (1356.6)Controls family size Yes Yes YesShare schooling 0.65 0.19 0.48Number of children 7,314 7,309 5,612
Note: Linear probability models. In column (1), the outcome variable is an indicator variable that takes one the value 1 if the child has attendedformal primary school and 0 otherwise. In column (2), the outcome variable is an indicator variable that takes on the value 1 if the child hasattended only Qur’anic school and 0 otherwise. In column (3), the outcome variable is an indicator variable that takes on the value 1 if the childhas attended the 5th grade of primary school and 0 otherwise.Sample Columns (1) and (2): Children who are older than 7 on the survey date, only when they belong to a family in which two children are 7 orolder on the survey date. Column (3): Children who are older than on the survey date, only when they belong to a family in which two childrenare 10 or older on the survey date.Robust standard errors in parentheses (clustered at the family (sibling-group) level).Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
Source: PSF2.
A-2.2. Observable characteristics of identifying families
Table A-2.2 displays the characteristics of mothers according to whether all their children were
younger than 5 at the time of the divorce, all their children were older than 6 at the time of
the divorce, or (at least) one child was younger than 5 at the time of the divorce and one child
was older than 5. Families that belong to the latter group make up the identifying families.
Women who divorced when all their children were older than 5 at the time of the divorce had
been married longer (as more time had elapsed between the birth of the last child and their
divorce) and had fewer children younger than 25 (due to censorship to the right of the data)
than other divorced women. Women who divorced when all their children were younger than
5 at the time of the divorce had been married for a shorter period, 5 years on average (which
is consistent with having two children who were younger than 5 at the time of the divorce)
and fewer children were born into their last union than among other divorced mothers in the
sample. These differences are because the age of women, the number of their children, and
the length of their marriage are all correlated. There were no significant differences in the
education levels of mothers. Mothers whose children were all younger than 5 at the time of
the divorce seem to be more educated, which is consistent with the fact that these women
were younger than other divorced mothers and that the average level of education of women
has increased over time, as shown in Figure 2.3b for primary school (this trend can be seen
104 Appendix
for higher educational levels). The mothers of children on whom our coefficient of interest is
estimated hence do not appear to be different from other mothers who divorce when they have
at least two children, apart from their structural demographic factors.
Table A-2.2: Characteristics of families according to children’s age at divorce
(1) (2) (3) (4) (5)Identifying All children all children Diff. identifying Diff. identifying
older than 5 younger than 5 all older all younger
Age 37.81 42.92 35.65 5.11*** -2.16Highest education levelNo formal education 0.64 0.65 0.47 0.00 -0.17Primary 0.27 0.14 0.47 -0.13 0.20Secondary or higher 0.09 0.22 0.06 0.13 -0.03Household ConsumptionFood expenditures (hh) 156730.92 206790.20 123768.48 50059.28 -32962.44Other expenditures (hh) 143408.60 128730.60 151600.76 -14678.00 8192.16Family compositionNumber of children alive 5.00 4.00 4.92 -1.00 -0.08Number of children (≤25 y.o) 4.02 3.08 4.18 -0.94*** 0.16Number of children - last uniona 2.96 2.76 2.18 -0.19 -0.78**Last marriage duration 9.94 16.14 4.70 6.20** -5.24**
Number of mothers 47 38 17 85 64
Note: The table presents characteristics of women and of their families according to the age of children at the time of divorce. Column (4) presentsthe results of a difference in means test between identifying families and families where all the children were older than 5 on the survey date.Column (5) presents the results of a difference in average test between identifying families and families where all the children were younger than5 on the survey date.Sample: Divorced mothers who have at least two children (from her marriage that ended) who are older than 7 on the survey date.Significance of the t-test of the difference is reported in column (4) and (5). P-values are denoted as follows: + p<0.15, * p<0.10, ** p<0.05, ***p<0.01.
A-2.3. Additional table and figure: Custody and fostering decisions
A-2.4. Additional results
Figure A-2.1: With whom do children of divorced parents live?
(a) Girls younger than 25 whose parents divorced.
0
20
40
60
80
100
0-5 years 5-10 years 10-15 years 15-25 yearsby age on the survey date
Lives with other people Lives with motherLives with father
(b) Boys younger than 25 whose parents divorced.
0
20
40
60
80
100
0-5 years 5-10 years 10-15 years 15-25 yearsby age on the survey date
Lives with other people Lives with motherLives with father
Appendix 105
Table A-2.3: Custody and fostering decisions after a divorce
(1) (2) (3) (4) (5) (6)Dependent variable With Mother With Mother With father With Father Fostered FosteredSpecification LPM SFE LPM SFE LPM SFE
Sample Main sample a
Age at divorce0-5 y.o. -0.235*** -0.219*** 0.133*** 0.0985* 0.0932** 0.139**
Note: Columns (1), (3) and (5): Linear probability model. Columns (2), (4) and (6): Linear probability model with sibling fixed effects. Columns (1)and (2): The outcome variable is an indicator variable that takes on the value 1 if the child has been living with her mother at least until the age of7 and 0 otherwise. Columns (3) and (4): The outcome variable is an indicator variable that takes on the value 1 if the child has been living with herfather (but not with her mother) before the age of 7 and 0 otherwise. Columns (5) and (6): The outcome variable is an indicator variable that takeson the value 1 if the child has been fostered before the age of 7 and 0 otherwise.Control variables include: quadratic control for year of birth, birth order indicators (4 categories) and an indicator variable that takes on the value1 if the child is a girl and 0 if the child is a boy.a Children who are older than 7 on the survey date, only when they belong to a family in which two children are 7 or older on the survey date.b The sample size varies because the outcome variable is missing for some observations.Robust standard errors in parentheses (clustered at the family (sibling-group) level).Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
Source: PSF2.
Figure A-2.2: Coefficients associated with age-at-divorce variables
-.20
.2.4
.6
1 2 3 4 5 6Age at divorce
1 : LPM without controls 2: SFE without controls3: SFE with controls
Note: Coefficients associated with binary variables for age at the time of the divorce. The omitted category cor-responds to children older than 7 at the time of the divorce. Specification: The dependent variable is an indicatorvariable that takes the value 1 if the child has attended primary school and 0 otherwise.The model is a linear proba-bility model with basic controls (similar to column (4), Table 2.3, but without interacted controls). Sample: Childrenolder than 7 on the survey date who belong to families with at least 2 children older than 7 on the survey date.
106 Appendix
Online Appendix
B-2.1. Additional robustness checks
Table B-2.1: Robustness checks: Primary school attendance
Model LPM SFE LPM SFE LPM SFE LPM SFE LPM LPMSample Main sample Main sample Restricted sample Restricted sample Divorce Divorce W. half-siblings W. half-siblings All children aged 7 or older All children aged 7 or older Siblings pairs (divorce)a
Dependent variable Ever enrolled in primary school ∆school (youngest -oldest)
Age at divorce0-5 y.o. 0.165*** 0.168*** 0.131* 0.168*** 0.197*** 0.197* 0.165*** 0.115*** 0.130*** 0.137***
(866.1) (753.2) (870.5) (753.0) (5018.3) (6645.6) (838.6) (732.3) (0.00831) (798.7) (0.0691) (0.203) (0.220)Controls Yes Yes Yes Yes Yes Yes Yes Yes No Yes No No NoControls – pairs No No No No No No No No No No No Yes Yes
Note: LPM & SFE: Basic linear probability model: columns (1), (3) , (5), (7), (9), and (10). Linear probability model with sibling fixed effects: columns (2), (4) , (6), and (8). The outcome variable is an indicator variable that takes the value 1 if the child has attended or attends formal (primary or secondary) school. AgeAtDivorce0/5 is anindicator variable that takes the value 1 if the child was 5 or younger at time of divorce and 0 if either the child was older than 5 or if her parents are not divorced. AgeAtDivorce6/25 is an indicator variables that takes the value 1 if the child was 6 or older at time of divorce and 0 if either the child was younger than 5 or if her parents arenot divorced. Control variables include: quadratic control for birth year, birth order indicators (4 categories) and an indicator variable that takes the value 1 if the child is a girl. Robust standard errors in parentheses (clustered at the mother level).Sibling-difference model: Columns (11) to (13) reports results from a sibling-difference model. The outcome variable is defined as the schooling outcome of the youngest child minus the schooling outcome of the oldest child of the pair. The variable of interest is an indicator variable that takes the value 1 if one child was younger than 5 atdivorce date and one was older (identifying pair).Controls – pairs variables include: the age difference of the pair and the age of the oldest child; indicator variables for whether the pair is a girl-girl pair, an older boy-younger girl pair, or an older girl-younger boy and an indicator variable for whether the oldest child is a girl.Sample(s): Main sample: Children who are older than 7 at survey date, only when they belong to a family in which two children are 7 or older at survey date. Restricted sample: Main sample excluding children than were 5 or younger at divorce date and who do not have a sibling who was older than 6 at divorce date. Divorce sample: Mainsample restricted to children whose parents divorced. W. half-siblings: Main sample, including half-siblings. All children aged 7 or older : All children aged 7 or older at survey date.a If there is a pair of children with one older than 5 and one younger, we select this pair by selecting the child who is the closest to each side of the cutoff. If there is no such pair, we choose the two children who were the closest to age 5 at the time of divorce. Observations from two families for which children closest to the threshold includetwins are dropped from the sample.b Number of additional children included in the sample (relative to the number of children in the main sample).Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
Source: PSF2.
Sensitivity to the definition of the controls: Sample made up only of children whose parents
divorced When we run the estimation on the sample made up of only children whose parents
divorced, we find similar results (columns (5) and (6), Table B-2.1): being younger than 5 at the
time of divorce is still positively and significantly correlated with primary school attendance,
in the basic LPM and in the SFE specifications. The order of magnitude of this coefficient is also
similar.
Alternative specification: sibling-difference model We report results using an alternative
strategy: a sibling-difference model (columns (11) to (13) in the Table B-2.1). For each family,
we select the pair of siblings that is the closest to the age of 5 at the time of divorce39. The
sample size is limited, but we find that within identifying pairs, younger siblings are more
likely to have attended primary school then their older sibling compared to pairs of children
who were older than the age of 5 at the time of divorce.
Including half-siblings in the sample We report regression results from both the LPM speci-
fication and the SFE specification in columns (7) and (8) of Table B-2.1 in Appendix. The sample
includes 671 additional observations. The results are similar to what we find when removing
the half-siblings from the sample.
39The selection process is as follows. If there is a pair of children with one older than 5 and one younger, we selectthis pair by selecting the child who is the closest to each side of the cutoff. If there is no such pair, we choose thetwo children who were the closest to age 5 at the time of divorce. We drop observations from two families for whichchildren closest to the threshold include twins.
Appendix 107
Including children who do not have a sibling older than 7 in the sample As we include
children who do not have a full sibling that is older than 7, we cannot estimate the SFE model.
Results from our LPM without SFE are reported in columns (9) and (10) of Table B-2.1. The sam-
ple includes 1149 additional observations. Being younger than 5 remains significantly and pos-
itively correlated with primary school attendance, but has a slightly lower magnitude (0.137)
than the coefficient estimated on the main sample (0.164).
Figure B-2.1 shows the results of the linear probability model when binary variables for each
age at divorce are introduced (ages 0 to 6). Model 1 is reported for comparison purposes (sam-
ple of 2 children of a divorced mother). The sample used to estimate the model 2 includes all
children older than 7 who parents divorced (even the ones who do not have a sibling over 7 to
be compared to). The sample used to estimate the models 3 and 4 (controls included) include
all children older than 7, whether or not their parents are divorced. The coefficients from these
3 models seem to be similar to the coefficients from model 1, except for age 4 and 5 at divorce,
where they seems to be lower: this pattern is consistent with the lower coefficient on being
younger than 5 when estimated on the larger sample. The coefficient associated to being 6 at
divorce date is almost exactly at 0 whichever the sample and model considered. These findings
suggest that the results estimated on the main sample are not overly sensitive to the inclusion
of children who do not have a sibling older than 7.
Figure B-2.1: Coefficients on age at divorce – All children aged 7 and older
-.20
.2.4
1 2 3 4 5 6Age at divorce
1: LPM || divorce; at least 2 children 2: LPM || divorce; at least 1 child3: LPM || at least 1 child 4: LPM+c || at least 1 child
Note: Coefficients associated to binary variables for ages at divorce. The omitted category groups ages at divorcehigher than 7. The dependent variable is an indicator variable that takes the value 1 if the child has attended
primary school, and 0 otherwise. Sample: All children older than 7 at survey date.
108 Online Appendix
B-2.2. Additional tables
Table B-2.2: Schooling of children according to mother’s characteristics
Controls No Yes Yes Yes Yes YesShare schooling 0.65 0.65 0.65 0.67 0.67 0.67Number of children 8,333 8,333 8,333 6,608 6,608 6,608Dependent variable Ever attended primary school Ever attended primary school Ever attended primary school Completed primary Completed primary Completed primary
Note: Linear probability models. In columns (1), (2) and (3), the outcome variable is an indicator variable that takes the value 1 if the child has attended or attends formal primary school. In columns (4), (5) and (6), the outcome variable is an indicator variablethat takes the value 1 if the child has attended or attends the fifth grade of primary school. Sample: Columns (1), (2) and (3): all children older than 7 at the survey date. Columns (4), (5) and (6): all children older than 10 at the survey date.Control variables include: quadratic control for birth year, birth order indicators (4 categories) and an indicator variable that takes the value 1 if the child is a girl.Robust standard errors in parentheses (clustered at the mother level).Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
Source: PSF2.
Table B-2.3: Characteristics of families of divorced women according to the age compositionof children at the time of the survey
2 children over 7 2 children over 8 2 children over 9 2 children over 10(1) (2) (3) (4)
Note: The table presents characteristics of mothers and families according to the composition of the family at the time of the survey.Sample: All mothers surveyed in 2011 who have divorced, with at least a child younger than 25 and two children over 7 (over 8, over 9 andover 10) at the time of the survey for the column 1 (respectively column 2, 3 and 4).
CHAPTER 3
ETHNIC HOMOGENIZATION AND PUBLIC GOODS: EVIDENCE FROM
KENYA’S LAND REFORM PROGRAM
Joint with Catherine Boone and Alexander Moradi1
Abstract Little is know about the effects of ethnic homogenization policies despite being anobvious policy option, if not an ethical one, based on the literature that links ethnic fraction-alization to negative development outcomes. In this paper, we examine the effects of ethnichomogenization on public good provision using a natural experiment that took place in Kenya.We study a large-scale land reform program that led to a significant reduction in ethnic diver-sity, the settlement schemes program. Using a novel dataset about the precise location of programarea boundaries (Lukalo et al., 2019) that we combine with archival, survey, census, and satel-lite data, we implement a spatial regression discontinuity design. We argue that the borderbetween program areas (treatment) and neighboring areas (counterfactual) is plausibly ran-dom at the local level and confirm that there are no observable differences in pre-treatmentcharacteristics. We find a strong discontinuity in ethnic diversity but no differences in schoolprovision between program areas and counterfactual areas in the short run as well as in thelong run. As individuals were resettled to the program areas, they likely lack the dense socialnetworks that favor collective action to either hold politicians accountable or to provide publicgoods throughout cooperation at the community level. Our results are not driven by spilloversfrom treatment to counterfactual areas. A mediation analysis indicates that income effects areunlikely to drive this null result.
1Catherine Boone acknowledges the financial support of ESRC Grant # ES/R005753/1 “Spatial Dynamics inAfrican Political Economy” (Boone, PI). Juliette Crespin-Boucaud acknowledges the support of the EUR grant ANR-17-EURE-0001. This paper benefited from discussions with Denis Cogneau, Oliver vanden Eynde, Jonathan Lehne,Nicolas Navarrete H., Stephan Kyburz, Avner Seror, Rebecca Simson, and Maiting Zhuang. We are grateful toseminar audiences at Bicocca, LSE, PSE, Wageningen, audiences of the IFS-UCL-LSE/STICERD Development Eco-nomics Seminar, as well as to participants of the NEUDC conference, the Journees de Microeconomie Appliquee(JMA) conference, the German Development Economics Conference, and the International Development EconomicsConference. All remaining errors are our own.
110 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
3.1 Introduction
A large body of work has linked ethnic fractionalization to negative outcomes, in the world in
general and African countries in particular (Alesina and La Ferrara, 2005; Alesina et al., 2016;
Easterly and Levine, 1997b). Yet the literature has little been concerned with potential solutions.
Neither has it been concerned about the fact that sub-national ethnic diversity levels result
from historical processes. Most policies implemented have gone in the direction of greater
integration (Bazzi et al. (2019) on Indonesia, Boesen (2019); Miguel (2004) on villagization in
Tanzania), rather than of ethnic separation that appears like a nuclear button (Milanovic, 2003).
We have a natural experiment that did precisely that.
This paper investigates the effects of ethnic homogenization on public good provision in Kenya.
We study a large-scale land reform program that led to a significant reduction in ethnic diver-
sity, the settlement schemes program.2 In the run-up and early years of independence, the Kenyan
government bought land owned by European settlers and redistributed it to African farmers.
About 3%-4% of the Kenyan population was part of the program. The government wanted
to propel agricultural development and to defuse land hunger (Boone et al., 2021). The land
reform combined elements of a rural development program, a redistributive land reform, and
an ethnic homogenization policy.
Using a novel dataset about the precise location of program area boundaries (Lukalo et al., 2019)
that we combine with archival, survey, census, and satellite data, this paper relies on a spatial
regression discontinuity design. The identifying assumption is that the exact borders between
program areas (treatment) and neighboring non-program areas (counterfactual) are as good
as random at the local level. We argue that this is plausibly the case, because of constraints
related to land acquisition. First, the Kenyan government wanted to acquire land in specific
parts of the country. Land prices turned out to be higher than expected and so the land to
be included in the schemes was revised downward (Leo, 1984). Moreover, state capacity was
limited. Even though the government wanted to include more land, they could not support that
administratively. We confirm that i) pre-treatment covariates do not differ across the border and
that ii) ethnic diversity decreased substantially in the program areas as a result of the program.
Our main outcome of interest is school provision. We also report results on population density,
field size, and polling stations, that help us understand the mechanisms at work.
Our main finding is that there are no differences in school provision between program areas
2The use of area-specific ethnic criteria to select beneficiaries has been documented by Leo (1984). This paper isthe first to quantify the effect this policy had on levels of ethnic diversity.
3.1 Introduction 111
and counterfactual areas in the short run as well as in the long run. Results do not change if we
consider per capita measures. Results on field size (a proxy for farm size and income3) suggest
that program areas are poorer and inequality is lower as compared to the counterfactual areas.
Results are at odds with the vast literature that finds a negative relationship between ethnic
fractionalization and public good provision but are in line with the literature arguing that eth-
nic diversity in fact proxies for the strength of social networks within communities (Eubank
et al., 2019; Fearon and Laitin, 1996; Miguel and Gugerty, 2005). As individuals were resettled
to the program areas, they likely lack these dense social networks that favor collective action
to either hold politicians accountable or to provide public goods throughout cooperation at
the community level. For that interpretation to hold, land reform itself must not have had a
negative direct impact on school provision, for instance through income effects or changes in
inequality. We use a mediation analysis to show that our results on school provision are robust
to the inclusion of a proxy for income (field size). Moreover, we use border segments unaffected
by the program to rule out spillover effects.
This paper relates to three strands of literature.4 First, there is a very large literature on eth-
nic diversity and its consequences (Alesina et al., 1999, 2016; Baldwin and Huber, 2010; Miguel
and Gugerty, 2005). Kenya is a prime example in studies of ethnic diversity. The existence of
ethnic politics and favoritism has been documented (Harris and Posner, 2019; Kramon and Pos-
ner, 2016). Ethnic diversity demonstrably and negatively affected productivity in firms (Hjort,
2014); public sector performance (Eynde vanden et al., 2018); and public good provision such
as roads (Burgess et al., 2015) and schools (Kramon and Posner, 2016; Miguel and Gugerty,
2005).5 Using a plausibly exogenous source of ethnic homogeneity we do not find more public
goods in ethnically homogeneous areas. This suggests that there may be contexts where the
simple link between ethnic diversity and public good provision is broken. We are not the first
ones with a non-positive finding. Harris and Posner (2019) argue, using Kenya as a case study,
that development projects, including school-related projects, are more allocated according to
needs than as a reward to political supporters. The effect of ethnic politics on schooling has
recently been challenged by Simson and Green (2020). Second, this paper relates as much to
3This is a strong assumption that nevertheless may apply for small geographic areas, where soil quality andcultivation practices are similar.
4We do not interpret our results as land reform results that would have external validity: the land reform isprogram is bundled with an ethnic homogenization component and the program area borders correspond to dis-continuities in ethnic fractionalization levels.
5Miguel and Gugerty (2005) is highly relevant for our paper. The authors find ethnic heterogeneity to be asso-ciated with lower quality school facilities, supposedly because of lower funds generated from public fundraisingevents. Their identification relies on historical heterogeneous settlement patterns where two ethnic groups meet (inrural western Kenya - which correspond to African Land Units in our study). They also look at community waterwells.
112 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
the literature on the origins of local-level ethnic diversity and how ethnic diversity must be his-
toricized. We document and exploit a hitherto neglected exogenous source of homogenization.
The land reform extended Kenya’s ethnic “homelands” in a rather peaceful but very effective
way. Third, we contribute to the literature on resettlement and forced migration. Becker et al.
(2020) showed how Polish WWII refugees, who lost their property in the wake of Poland’s
westward shift, acquired more human capital as a result of this experience. Using the same
historical event Charnysh and Peisakhin (2021) showed how community values survived in
group majority settlements of displaced people. Bazzi et al. (2019) found greater national in-
tegration and more public goods in diverse communities of Indonesia’s resettlement program.
Miho et al. (2019) documented spillovers of gender norms from the deportees to the local pop-
ulation under Stalin’s ethnic deportations. Our results suggest that there are contexts where
resettlement had little to no effect on the provision of public goods. Our results cannot be ex-
plained by ethnicity per se, but possibly by the lack of a deep-rooted social network typically
associated with ethnicity - and that was also lacking in nearby fractionalized communities, due
to resettlement on both sides of the border.
The paper is organized as follows. Section 3.2 presents the data. Section 3.3 provides back-
ground on the land reform program under study. Section 3.4 details the methodology used.
Section 3.5 presents the results from the main border comparison. Section 3.6 discusses their
interpretation. Section 3.7 provides additional results using other border comparisons as well
a robustness checks. Section 3.8 concludes.
3.2 Data
This project combines historical administrative records with modern survey data and GIS in-
formation from a variety of sources.
Program areas: Settlement schemes. We obtained the exact boundaries of the settlement
schemes from Lukalo et al. (2019), who constructed a map layer from over 1,500 digitized Reg-
istry Index Maps (RIM) kept by Survey of Kenya in 2018. Polygons were joined with attribute
data from the Ministry of Lands and Physical Planning (MoLPP) dataset on Kenyan settlement
schemes, presented in Lukalo and Odari (2016). We added more background data on the pro-
gram areas included in our sample from the Annual Reports of the Department of Settlement
(1974), notably land prices, estimated income potential, and ethnic group selected for settle-
ment.
3.2 Data 113
European owned farms: Scheduled Areas. We digitized the boundaries of the Scheduled
Areas from Kenya National Bureau of Statistics (1926). This collection of sheets at 1:250,000
scale shows the plots reserved for European farmers as well as Forest Reserves and African
Land Units.6
Physical geography. We use data on crop suitability for coffee, tea, maize, sugar, and wheat at
a 5km x 5km resolution from FAO/IIASA (2011). Elevation data comes from the Shuttle Radar
Topography Mission (SRTM) and was sourced from Regional Centre For Mapping Resource
For Development (2020) at 30m x 30m resolution. A shapefile of forests and game reserves is
sourced from Kenya National Bureau of Statistics (1926) and Foundation (2020).
Ethnic diversity. From the 1962 and 1989 Population Censuses we retrieved the ethnic back-
ground of the population at the smallest administrative units available (“location” & “sub-
location”, respectively).7 We also know ethnicity at the GPS location of households from the
Kenya DHS surveys 2003, 2008-09, 2014 (Kenya National Bureau of Statistics, 2015).
Population. From Jedwab et al. (2017) we obtained data on urban population for the years
1901, 1911, 1920, 1926, 1931, 1948, 1962, 1969, 1979, 1989, 1999, 2009 for any city with more
than 500 inhabitants at any point in time (N=249). The data is based on estimates and censuses,
gives point coordinates of the city center but does not record the extent of urban sprawl. Kenya
Gazetteers for the years 1955, 1964, and 1978 provide point locations of populated places (U.S.
Board on Geographic Names, 2018).8 Finally, CIESIN (2016) provides population estimates at
a resolution of 1 arc-second (approximately 30m in Kenya) for the year 2015. Their estimates
are based on high-resolution (0.5m) satellite imagery, classifying blocks of optical satellite data
as settled (containing buildings) or not, and then using proportional allocation to distribute
population data from subnational census data to the settlement extents.9
Agriculture. From FAO (2000) we obtained a polygon shapefile with field sizes in 2000 dis-
tinguishing between three categories: predominantly i) 1=small (<2 ha), ii) medium (2-5 ha),
and iii) large (>5 ha).10
6We preferred this source over later data, because of the large scale that allows highly accurate digitization ofthe borders. By 1926, the boundaries have largely been set (Morgan, 1963). We confirmed this by comparing theScheduled Areas of 1924 to 1953 (Troup, 1953) and 1962 (Morgan and Shaffer, 1966).
7In 1962, a location was the smallest administrative unit and there were 490 locations. In 1989, the number of sub-locations was 3,715. On average, the 1962 and 1989 administrative units comprised 17,620 and 5,714 individuals,respectively.
8The Gazetteers recorded places smaller than 500 inhabitants, or 792, 2,341, and populated places in 1955, 1964,and 1978 respectively.
9Night light data is problematic in rural contexts (Gibson et al., 2020).10These data are based on FAO’s Africover dataset (2000) and were presented in World Resources Institute (2007,
Map 5.7). We merged mixed field sizes (labeled “medium mixed with small” and “large mixed with medium”) to
114 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
Political Economy. The geographic coordinates of 26,447 polling stations during the 2012
presidential election come from Maron (2013), originally released from the Independent Elec-
toral and Boundaries Commission (IEBC), which, founded in 2011, is responsible for supervis-
ing referenda and elections in Kenya.
Public Goods. From 1955, 1964, and 1978 gazetteers we obtained point locations of schools,
markets, and facilities (U.S. Board on Geographic Names, 2018).11 While not accurate over time,
the spatial variation can be considered largely consistent.12 We obtained the precise geographic
point coordinates of all primary and secondary schools in Kenya from the 2007 school census
collected by the Ministry of Education (2009).
3.3 Background: Land, ethnicity, and public goods in Kenya
3.3.1 Colonial period
European settler areas. When the British colonized Kenya at the end of the 19th century, they
alienated large parts of the territory and reserved the land for European settlement. We refer to
this area around Nairobi, central and western Kenya as the European settler area (see map 3.1).13
Three factors were paramount for the alienation of land (Morgan, 1963). First, the climate was
deemed suitable for European settlement. The high altitude, generally above 5,000 feet (ca.
1,500 meters), made the alienated land malaria-free. Second, the land was close to the railway
line, thus giving European farmers market access.14 Last, the land had a low permanent pop-
ulation density at the end of the 19th century, part of it being used as cattle grazing grounds,
leading the authorities to declare the land as inhabited.15 The European farms, ranging in size
from 400 to 12,000 hectares, required African labor. This was secured by a squatter system (Leo,
1984). In return for labor, European settlers allowed African families to live on the farm and
cultivate a plot of their own.16 The status of squatter meant that Africans could be evicted at
the overall categories (”medium” and ”large” respectively). Moreover, we exclude any polygons as non-agriculturalif the field size was not mentioned as well as “urban settlements” (N=33). From the same source, we know whethercoffee and tea were cultivated.
11The category “facilities” groups together a wide range of features such as administrative facilities (offices, com-munity centers, post offices, banks, police posts, prisons), agricultural facilities (veterinary facilities, agriculturalresearch stations, experimental farms, dairies), recreational and sports facilities, religious buildings (mosques, mis-sions, churches) and industrial facilities (quarries, mines, mills, power stations).
12For example, the number of schools reported in 1955 and 1964 is largely consistent with data in Survey of Kenya(1959), but then decreases by 1978 and falls short of the numbers reported in Kenya (1980).
13These areas were officially called “Scheduled Areas”, and informally referred to as “White Highlands”.14Most of the railway was built between 1895 and 1929.15Kenya was hit by several shocks that depopulated this territory: The rinderpest of 1892 with losses of 80–90%
of cattle (Mack, 1970), locust swarms resulting in a devastating famine in 1897–1899, and a smallpox epidemic(Ambler, 1988). The pastoralist Maasa, who used ca. 60% of the land, were displaced from the Scheduled Areas tothe Reserves in the South and Laikipia.
16Agreements varied in the amount of labor that needed to be supplied, size of the plot, allowance of stockholdings, and wage payments (Youe, 1988). Over time, Europeans sought to replace the system with wage labor.
3.3.2 Land reform post-independence 115
Figure 3.1: Map of Kenya: Scheduled Areas, program areas, other redistributed areas, andmain cities.
the end of the labor contract. Overall, there was sizable labor immigration into the European
settler areas that was not restricted to a certain ethnicity.
African Land Units. The British confined Africans to “Native Reserves”, partly adjacent to
the European settler areas, partly additionally separated by a forest (see map 3.1). Each reserve
was designated for an African ethnic group. There was a Kikuyu Reserve, a Luo Reserve, and
so forth. We call these areas the African Land Units. In these areas, land tenure was based on
what the colonial government deemed as the prevailing customary law: Land was community
property. It could not be sold to individuals. Tribe members only had rights to the land they
were cultivating (Parsons, 2012).17 These areas were rural. Most people were engaged in small-
scale agriculture and animal husbandry, relying on family labor. The rigid reserve system
restricted expansion, and with population growth some areas became overcrowded. A class of
landless people emerged. And, the colonial government neglected these areas in the provision
of public goods (Eynde vanden et al., 2018).
3.3.2 Land reform post-independence
The land in the European settler area was made up of mixed farms, agricultural estates and
plantations, and ranches. Of the land acquired by the Kenyan government, almost all was
Evictions took place.17Kikuyu customary law, in fact, knew individual ownership (Leo, 1984, p.29-32).
116 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
in the “mixed farm areas”.18 Most mixed farms were marginally profitable or operating at
a loss and much of their hectarage was unexploited. European settlers knew that without
subsidies, state protection, and the colonial, apartheid-like labor repressive regime, they could
not survive economically (Leys, 1974). When British colonial rule was about to come to an end,
many European farmers wanted to leave.19 The independent Kenyan government wanted
to mitigate the land problem and create a new class of African farmers that would maintain
agricultural productivity. The Kenyan government decided not to expropriate the land but to
buy it. Funds came in from the UK, Germany, and the World bank in the form of partly grant
partly interest-bearing loans. By 1971 about 500,000 hectares, or 17% of the European settler
area, were transferred to about 49,500 African families. Based on the population count of the
Census 1962, this represented about 4% of the population.20
Program Areas. In this paper, we focus on the 120 conventional schemes implemented be-
fore 1973. We call them the program areas. We exclude settlement schemes located in the Coast
province (N=8) because climatic and agricultural conditions are very different from the Kenyan
Highlands. We exclude Haraka schemes (N=17) that settled landless poor on very small plots
(0.6 hectares on average) taken from abandoned or mismanaged European farms. We exclude
cooperative schemes (N=12), because of their cooperative ownership and organizational struc-
tures, which later were described as a failure (World Bank, 1973); and ranches (N=6) designated
for extensive agriculture (cattle herding) due to low soil quality and water availability. We also
exclude schemes later than 1973. While the government continued to create new smallholder
settlement schemes in the following decades, more and more marginal lands were allocated
and the government moved away from agricultural development goals and used the creation
of schemes to defuse tensions in politically strategic areas (Boone et al., 2021).21 The treatment
of those excluded schemes is very different; endogenous selection remains a concern; coun-
terfactual and policy implications are less clear. Figure 3.1 shows the location of the program
areas (in black), non-program land reform parcels allocated during our study period (in gray,
non-conventional schemes), and land reform parcels allocated after 1973 (light gray).22
18Most of the rural white population would have been residing in the mixed farm areas.19The 1962 census counted 55,759 Europeans. Their number dropped to ca. 5,000 in 1969 (Jedwab et al., 2017).20In comparison, the villagization programs (the relocation of people into centralized planned villages) in Tanza-
nia (1970s), Mozambique (1980s), and Ethiopia (1986-1989) involved about 30%, 15%, and 28% of the population,respectively (Hanlon, 1990; Lorgen, 2000). In contrast, forced migration under Stalin amounted to 6% of the Sovietpopulation (Polian, 2004).
21See Di Matteo (2019) for a study of Kenyan land reform from 1990s to 2016.22The exact boundaries of 17 schemes are unknown because their Registry Index Maps have not survived. We
nevertheless know their approximate location, size, and type: 15 of the 17 schemes are Haraka schemes. Thesedo not differ in size from the Haraka schemes whose locations are known (1206 and 1236 hectares for schemes ofknown and unknown location, respectively).
3.3.2 Land reform post-independence 117
How did the state select the land to be included in the program areas? The newly formed
Kenyan government did not want to create a rig-rag of plots and schemes, but to have them
clustered. Under the idea of building ethnic nations, they also aimed to acquire land close to
the former Native Reserves.23 The government then faced several constraints about how much
land can be incorporated. First, the Kenyan government agreed with the British government
that land had to be bought which qualified as “compassionate purchases” (land from older
farmers or widows, for instance), the extent of which was not known in advance. Second, the
development loans were agreed in advance. The Kenyan government was prepared to pay
market prices as of 1957-1959, and then adjust the price based on current crop profitability.
However, in practice there was room for negotiation and in some cases, land was valued ac-
cording to a different formula.24 At the end of the process, land prices turned out to be higher
than officials expected and so the land to be included in the program areas was revised down-
ward (Leo, 1984).25 Third, state capacity was limited: the whole of Kenya’s Department of
Agriculture was occupied with the land reform and even if the government wanted to include
more land, they could not support that administratively. These constraints caused the exact
boundaries of the program areas to be unpredictable, which is the foundation of our empirical
strategy.
Table 3.1: Characteristics of settlement schemes
Type of Scheme Number Estimated yearly Average size Average number Average settlement Averageincome potential of plots of plots charge per hectare dev loan
High Income 37 104.2 15.9 145.6 22.4 243.9[25.0] [8.3] [95.9] [6.4] [57.3]
Data from Department of Settlement (1974). Conventional farms only. Excluding cooperatives (N=8), ranches (N=6),Haraka (N=12), Harambee (N=2) and Shirika (N=4) schemes. The settlement charge reflected the government’s pur-chase price minus a grant element of 23%. A large percentage was given as a loan (90% and 100% in the Low/Highincome potential schemes respectively (Kenya, 1971)). Development loans were meant to cover the purchase of pro-ductive assets and inputs such as housing, fencing, livestock and crop cultivations. Standard deviation in brackets.
The program areas made up the bulk of the land reform. About 412,000 hectares were trans-
ferred to 35,000 families or 2.8% of the population as of 1962.26 The government created two
types of schemes, according to the income that farmers could earn on their plot: Low and high-
23In principle, this may suggest that European farmers closer to the Native Reserves may have had a betternegotiation position as compared to farmers further away from the Native Reserves because then the risk of notbeing able to sell the land to the government increased.
24Ruthenberg and fur Wirtschaftsforschung) (1966) reports a price calculated “on the basis of eight times theaverage annual profit, working on a 12.5 percent return on the capital invested in land, buildings, water supplies,roads, etc.”, which he deemed extremely generous.
25See the replies to European farmers who complained that their land was not included into the scheme “Landfile: Revised land settlement scheme” FCO/41/6927 in the British National Archives.
26Today, about 2% of Kenyans live in the program areas (own estimation).
118 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
income schemes.27 Table 3.1 reports descriptive statistics. Low income schemes were more
frequent than high income schemes; their earning potential, as estimated by the government at
that time, was about three times lower. The average plot size, in contrast, was not that differ-
ent. Plots in low-income schemes were on average 12.1 hectares, about one third smaller than
the average size in high-income schemes. Income potential is thus not driven by plot size, but
rather by differences in soil quality and the crops that can be cultivated(Leo, 1984, p. 159).28 The
final columns show that the settlement charge29, which African farmers assumed as a debt in
return for the plots, was similar for both types of schemes, but that developmental loans, which
accounted for permanent improvements like buildings and farming inventory was twice larger
for the high income schemes.
Selection into high and low income program areas. The selection of farmers into the schemes
depended on three criteria: income, skills, and ethnicity, the first two differed by type of
scheme, whether high or low income program areas. After all, the government wanted to create
a new class of African farmers on good quality land and commercially-oriented farms, i.e. the
high income program areas. First, the land was not given for free. African farmers had to buy
the land. The costs were linked to market prices, and varied according to the income potential
(see Table 3.1). However, farmers did not need to pay up-front. They typically received a loan
repayable after 5 years from farm proceeds. Hence, credit-constrained farmers were not overly
excluded (Kenya, 1971). Once farmers repaid the loan, they owned the land.30 At low-income
program areas even landless people and squatters were given land. In contrast, in high-income
program areas, farmers were required to prove their ability to invest. According to Leo (1984,
p. 82) it was a sum of money that only prosperous farmers or members of the petty bourgeoisie
could afford. Second, the government was keen to select farmers who would be able to repay
the loans and to maintain agricultural production (i.e. national food security). Hence, skills
and agricultural knowledge were an advantage. In low income program areas, it was enough
to have settled on the land, either employed by the European farmers or as squatters. Besides,
the government provided agricultural training as part of the land reform.31 As high income
27The government named the schemes differently, namely high and low-density schemes, based on the targetedpopulation density.
28Differences in the potential income of schemes are indeed correlated with soil quality. Using data fromFAO/IIASA (2011) on crop suitability on the main export crops that are relevant for this part of Kenya (coffee,maize, and tea), we compute the average crop suitability value of each settlement scheme. The average crop suit-ability is always numerically higher in high income schemes (36 schemes) than in low income schemes (85 schemes).T-tests of differences in means are however not statistically significant.
29The settlement charge covered the expenses incurred by the Kenya government in purchasing the land plus a10% “bad-debt” margin.
30There is evidence that farmers sold land parcels through informal means before they had fully repaid the loan(Ambwere, 2003).
31In 1962, mobile training teams toured the schemes and explained phosphate application, line planting of maize,dipping, calf-rearing, and artificial insemination (Department of Settlement, various). Later, more formal training
3.3.2 Land reform post-independence 119
potential areas meant a larger commercial enterprise, there was probably a stronger emphasis
on entrepreneurial skills (Nottidge and Goldsack, 1966). Third, the government sought to ex-
tend the regional base of ethnic groups across the Reserves (Leo, 1984, p. 120-4).32 Kikuyus
were offered land in Central Kenya; Merus and Kamba in Eastern region and the Kalenjin in
Rift–Valley (Lukalo et al., 2019).33 In lands with large numbers of African laborers and squatters
of the “wrong ethnic group”, these people were largely removed.34 The ethnicity requirements
existed for both low and high income program areas.
The differences between the program areas are illustrated by the results of a field survey in the
early 1980s. Comparing high income scheme Passenga with neighboring low income scheme
Ol-Kalou West in Nyarandua North, Leo (1984) found in the high income area: fewer people
per plot (9.3 versus 13.3), more laborers (2.1 vs. 0.7), fewer squatters remaining on the land (0.32
vs. 2.2), commercial background (40.5% vs. 6.8%), more absentee landlords (33% vs. 8%).35 Leo
(1984) also observed that the high income program areas showed signs of prosperity such as
corrugated iron roofs, houses of wood and stone, whereas low income program areas were
dominated by mud houses, thatched roofs, and extensive areas of uncleared bush.
Counterfactual: Other parts of the former European settler area What happened to the Eu-
ropean settler area that was not allocated before 1973? From Lukalo et al. (2019) we calculated
that the Kenyan government acquired another 208,000 hectares, or 7% of the European settler
area, and used it for smallholder schemes implemented after 1973. About 25% of the land was
acquired by private land buying companies that bought the European farms, subdivided the
land and sold it individually (Leys, 1975, p.89-90).36 Some of the land was channeled as large
holdings to members of the government elite, some remained government property, including
as Agricultural Development Corporation (ADC) farms (Boone et al., 2021). About 50% of the
area remained in the hands of the foreign land-owners.37 Much of this was ranch land and cor-
porate holdings. Overall, to use the words of Leo (1984), the “regular market in land was being
was provided at farmer training centers. More of these were built close to the schemes.32The land reform program started at the beginning of independence negotiations (first preparations were made
in 1961). In line with the idea of regionalism (“Majimboism”), most ethnic groups were represented, except theMasai, who were not considered despite having a historical claim on the land. See Medard and Golaz (2011) foradditional information on internal borders and conflict related to access to land.
33Scraping the various issues of the Department of Settlement (various) we find 59 schemes explicatively linkedto one of the ethnic groups. Unfortunately, a systematic listing is not available.
34The 66/67 Department of Settlement (various), for example, states that “squatters who were residing on thesettled plots [Chepsir & Tugenon] have been removed to Lumbwa Township.”
35His survey consisted of interviews with a 40% random sample of plot-holders or managers.36The government supported this by increasing the capital stock of the Land and Agricultural Bank of Kenya and
permitting loans up to 80% of the value of the land if funding was not available in the money market.(Leo, 1984, p.99).
37Part of it are the famous Delamere estates, which, however, are nowhere close to the program areas and there-fore not relevant for our study.
120 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
allowed to become the main vehicle for land transfer to the African bourgeoisie”. This area
constitutes the counterfactual – what would have happened in the absence of the land reform.
3.3.3 Schools in Kenya
Education was part centrally, part locally-funded (Simson and Green, 2020). In the first decade
of independence, the construction of primary schools and teachers’ houses was funded lo-
cally, while afterward the central government stepped in and paid for some or all of the re-
current expenditures (Eshiwani, 1990). Such schools were referred to as harambee schools.
Kenya’s secondary schooling system had a significant private component. Historically, many
secondary schools were created as Harambee projects under local initiative using local re-
sources.38 Harambee schools were typically of lower quality and many were later back in-
tegrated into the formal government school system (Mwiria, 1991).
We use data from the 2007 census of schools to determine which institutions are in charge of the
schools.39 In 2007, most (41%) primary schools in Kenya were still financed and run by religious
organizations and about 31% of primary schools are run by the central government (Table A-3.1
in Appendix). About 20% of schools are funded by private individuals or organizations. Only
3% of schools are registered as community schools. Almost half (49%) of secondary schools are
run by religious organizations. The share of schools funded by the central government (22%) is
lower than for primary schools. 5% of secondary schools are registered as community schools.
3.4 Empirical Strategy
The main discontinuity we are interested in is the border between program areas and the Eu-
ropean settler area (see Figure 3.2). We discuss below the specification used as well as the
motivation and identifying assumptions.
3.4.1 Specification
The spatial regression discontinuity approach exploits the fact that the status of the land changes
discontinuously at the border of the program areas. The identifying assumption is that, after
controlling for the distance to the program area, being on the program area side (treatment)
or on the other side (control) is a random event uncorrelated with other unobservables deter-
minants of public good outcomes. Differences in outcomes across this border can hence be
38Schooling made up about 60% of Harambee projects (Ngau, 1987).39We use the reported school sponsor of each school as well as the status of the school (private or public). Accord-
ing to recommendation 13.28 from the Report of the Commission of Inquiry into the Education System of Kenya:Sponsors are required to take an active role in spiritual, financial, and infrastructural development to maintain thesponsor status(Republic of Kenya, 1999).
3.4.1 Specification 121
Figure 3.2: Illustration: Boundaries studied.
Program Areas(Plots distributed to
African farmers)
Comparisons# 1: Program Areas vs Counterfactual# 2: Program Areas vs African Land Units
European Settler AreasAfrican Land Units
Counterfactual(Plots not distributed)
attributed to the program.
We choose 250m x 250m grid cells as the unit of observation.40 We assign geographic features
to the cell in which they are located. We define dc as the distance (in km) of the centroid of
each cell (c) to the program area border, with negative (positive) values identifying cells in
counterfactual (program) areas, and estimate model 3.1. We estimate the model on the cells
for which abs(dc) ≤ ∆ where ∆ is either 5km (main results) or the value of the bandwidth
determined by a data-driven selection process (Calonico et al., 2017) (robustness). We use a
polynomial of order 1 (Gelman and Imbens, 2019).
Yc = α · dc + Tc · β · dc + f(Geoc) + ηc (3.1)
Yc is any of the outcome variables studied. Tc is an indicator variable that takes the value 1 if
the cell is part of a program area and 0 otherwise. β is the treatment effect of interest. f(Geoc)
is a third-order polynomial function of the latitude and the longitude of the centroid of each
cell. Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance
estimator.41
40Colonization and then land reform disrupted land occupation patterns. By not using “villages”, we avoidpitfalls of comparability in settlement structures across space and time.
41We do not use administrative unit fixed effects in our analysis. Administrative boundaries in Kenya wereredrawn shortly before independence, joining program area blocs to reserve lands (Boone et al., 2021). As such,administrative boundaries often follow the outer borders of program areas and are therefore endogenous to thetreatment.
122 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
3.4.2 Identifying assumption
Motivation The identifying assumption is that the exact boundary at the local level was as
good as random. We argue that this is plausible. The Kenyan government aimed to create
contiguous territories rather than a rag rug of scattered farms. The inclusion into the program
depended on the funding available to buy the land. Funding was coming from developmental
loans that were agreed in advance. European farmers had a good negotiating position. While
prices paid by the Kenyan government were above market prices, there was a strategic ben-
efit of holding on to the land. This incentive decreased towards the boundary.42 In any case,
land prices turned out to be higher than officials expected and so the land to be included into
the program areas was revised downward (Leo, 1984)43 The land and pre-treatment economic
activities should be very similar on each side.
Boundaries of interest The main results are based on the study of the boundaries between
the program areas and the counterfactual areas (i.e. plots not redistributed), as depicted in
Figure 3.2. Of all the possible boundary segments that are located within the former European
settler area, we study those depicted in Figure A-3.1.
We exclude boundary segments from our analysis following four criteria. First, we exclude
boundaries that touch forest and game reserves, as these are not. Forest reserves were nei-
ther meant for European nor African agricultural production. During colonial times, they also
served as “buffers” to African Land units. Over time, some of the forests disappeared but in
general, satellite images indicate very strong discontinuities along the old forest reserves. Pro-
gram areas de facto frequently border protected areas (see Figure 3.1) and these areas cannot
be considered to be a valid counterfactual. Second, we exclude boundaries between program
areas and non-program redistributed areas (i.e. non-conventional program areas and plots re-
distributed after 1973), as these areas cannot be considered as a valid counterfactual. Third,
we exclude boundaries that are located within clusters of program plots, as these boundaries
might not be exogenous. Fourth, we exclude the boundaries located in Nyandarua county (the
area located to the East of Nakuru) as the boundary follows one of the escarpments of the Great
Rift Valley. This border is hence not exogenous.
42African purchasing power was low and limited demand, whereas the sudden increase in land supplied by themany Europeans wanting to leave implied low prices.
43See the replies to European farmers who complained that their land was not included into the program “Landfile: Revised land settlement scheme” FCO/41/6927 in the British National Archives.
3.4.3 Data for SRDD 123
3.4.3 Data for SRDD
Matching grid cells to data Units of analysis are 250m x 250m grid cells that are matched
to the variable of interest according to their location. First, schools, polling stations, and DHS
clusters are matched to cells according to their location using GPS coordinates. Second, the
average of population, buildings, and altitude is computed for each cell from a high-resolution
raster file (grid cells about 30m2 in Kenya). Third, cells are matched to the agricultural data
(polygons that indicate field size and the type of crop grown).
Additionally, we compute two measures of interest using for schools and polling stations. First,
we compute the number of schools, students, and polling stations in each cell. Second, to
provide a per capita measure, we compute Voronoi polygons around each primary school, sec-
ondary school, and polling station. These polygons proxy for the catchment area of polling
stations and schools. We assign the population of the Voronoi polygon to the cell in which the
school or polling station is located, thus ensuring that there are no double counts of population
and that cells with 0 population are not dropped from the analysis.44
Excluded cells We compute the distance from each cell to the closest boundary of interest. We
then drop the cells that should not be considered as counterfactual areas. Cells that are located
in land reform parcels that are not conventional program areas (e.g. haraka schemes) or in
land reform parcels that were allocated later than 1972 are excluded from the control group.
We drop cells that fall into protected areas such as forests and cells that fall into water bodies
(lakes, main rivers). Additionally, we drop cells that are crossed by a boundary of interest,
as we cannot know whether features assigned to the cell are in the treatment area or in the
counterfactual area.45
Measurement error & Resolution For a spatial discontinuity analysis, the size of the cells
should ideally exceed the resolution of the underlying data. If, for example, soil quality is only
measured at 5km x 5km resolution, we will not detect any real discontinuity on the ground at
250m x 250m resolution. This is not a concern for our analysis, as we do not use data that was
measured at a resolution that exceeds the resolution of the cells.
A related issue is measurement error. In general, classical measurement error will attenuate
any discontinuity on the ground. Two kinds of measurement error could be a concern for our
analysis. First, there could be measurement error in the outcome data that we use. Regarding
44Voronoi polygons sometimes cross across the boundaries under study. We argue that there is no reason that thecatchment area of schools would stop at the border between program areas and other areas, as schools are not aneasily excludable public good. Using the population of the cell to proxy for the number of inhabitants yields similarresults to using Voronoi polygons, though cells with 0 population are dropped from the per capita analysis.
45This also addresses measurement error close to the border.
124 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
geocoded places (schools, polling stations), it seems likely that measurement errors in coordi-
nates is random, and hence that it would not bias our estimates. Coordinates of DHS clusters
have built-in measurement error, as the clusters are displaced by up to 5km in rural areas (up to
10km for 2% of these clusters). This measurement error should not affect program areas more
than other areas but is likely to blur the discontinuity.
Second, there could be measurement error in the boundaries of the program areas. This type
of measurement error seems unlikely, as the boundaries of the program areas were digitized
based on official Registry Index Maps (RIM) that were obtained from Survey of Kenya by the
National Land Commission (NLC) in 2018.46 We confirm that the boundaries match geograph-
ical features observed in Open Street Map and coincide well with plot boundaries from the
British National Archives.
3.5 Results
We compare areas close to the border separating the program areas from the European Set-
tler Area left to market forces (counterfactual). We first present results on a pooled sample of
program areas, then we discuss heterogeneity according to the expected income potential (at
allocation time) of program areas.
3.5.1 Validity checks: Ethnicity and pre-treatment comparison
Ethnicity & land reform program The effect of the land reform was a striking change of the
ethnic composition within the program areas (Figure A-3.2).
Table 3.2 shows the results of a regression of ethnic fractionalization (ELF)47 on the share of the
administrative unit (1962 locations and 1989 sublocation) that is located within the European
settler areas (but not within the program areas) and the share of the administrative unit that is
located within program areas. In 1962, European settler areas are significantly more ethnically
diverse than the African Land Units (column 1). The future program areas are also more diverse
than the African Land Units and appear to be as diverse as other parts of the European settler
areas (p-value: 0.15, column 1).
When the European Settler Areas are further split to consider separately areas that were never
included in any program and future land reform or resettlement parcels, the program areas
46The Registry Index Maps are official, legal maps of the plots in Kenya that were produced by surveyors of theGovernment of Kenya. These maps were kept in paper form at Survey of Kenya. These maps were geolocalizedthrough a joint effort of the NLC, the Spatial Analysis Lab of the University of Richmond, and the LSE in 2018-2020.The work took place in Nairobi, Richmond, and London.
47ELF equals one minus the Herfindahl index of ethnic group shares. It expresses the probability that two ran-domly selected individuals from that administrative unit belonged to different ethnic groups.
3.5.1 Validity checks: Ethnicity and pre-treatment comparison 125
appear a bit less diverse than the non-program areas (p-value: 0.04, column 2).
Table 3.2: Is the program associated with ethnic homogeneisation?
% in non-program European settler areas 0.534*** 0.519*** 0.533*** 0.534*** 0.378*** 0.379*** 0.378*** 0.378***(0.0322) (0.0337) (0.0322) (0.0322) (0.0632) (0.0668) (0.0632) (0.0632)
% in program area 0.390*** 0.313*** 0.0484 0.0491(0.0905) (0.0949) (0.0664) (0.0617)
% in ethnic program area 0.591*** 0.0285(0.184) (0.0614)
% in non-ethnic program area 0.333*** 0.0686(0.101) (0.0870)
% in program area (high income) 0.441* 0.0564(0.241) (0.0407)
% in program area (low income) 0.373*** 0.0458(0.116) (0.0852)
% in land reform parcels (<1973) -0.432 0.0380(0.573) (0.341)
% in land reform parcels (≥1973) 0.627*** -0.0520(0.180) (0.338)
In program area (DHS) -0.345(2.777)
In non-program European settler area (DHS) 16.15***(1.234)
Area (in thousands km2) 200.0 216.2 198.2 200.0 1050.7 1050.7 1054.9 1050.6(145.6) (144.0) (145.5) (145.7) (825.4) (825.7) (825.3) (825.5)
Lat & long (quadratic) Yes Yes Yes Yes Yes Yes Yes Yes Yes
Mean ELF 14.82 14.44 41.02s.d. ELF 21.30 20.77 23.42Mean % in program area 1.75 2.60p-valueb 0.15 0.04 0.01 0.01p-valuec 0.21 0.82 0.55 0.91N 462.00 462.00 462.00 462.00 3,500.00 3,500.00 3,500.00 3,500.00 2,342.00
Results from a OLS regression in which the outcome variable is the ethnolinguistic fractionalization (ELF) index. The higher its value, the more diverse the area. Incolumns 1 to 4, the sample is made up of all the locations (administrative units) that could be matched with the 1962 census data. In columns 5 to 8, the sample ismade up of all the sublocations (administrative units) that could be matched with the 1989 census data. Locations (1962) and sublocations (1989) are the smallestadministrative units for which ethnic data is available. In column 9, the sample is made up of all the surveyed clusters from the Demographic and Health Surveysconducted in Kenya in 2003, 2008-09, and 2014. Ethnic categories were recoded to be the same in both censuses and in the DHS surveys.% in European settler area: This is computed by excluding the program area. The % in European settler area coefficient can hence be interpreted as the difference in ethnicfractionalization between the European settler area (without the program area) and the African Land Units. The % in program area can be interpreted as the differencein ethnic fractionalization between the program area and the African Land Units in columns 1, 2, 5, and 6.The coefficient on % in land reform parcels (¡1973) can be interpreted as the difference in ethnic fractionalization between non-program land reform parcels (allocatedbefore 1973) and the African Land Units. The coefficient on % in land reform parcels (≥ 1973) can be interpreted as the difference in ethnic fractionalization betweennon-program land reform parcels (allocated after 1973) and the African Land Units.In columns 3 and 7, the coefficient % in ethnic program area captures the level of ethnic fractionalization for the program area coded as “ethnic” (according toDepartment of Settlement (various)) and % in non-ethnic program area captures the level of ethnic fractionalization for the program area coded as “non-ethnic” (thosefor which no information regarding an ethnic criterion was found). In columns 4 and 8, the coefficient % in program area (high income) captures the level of ethnicfractionalization for high income potential program area and % in program area (low income) captures the level of ethnic fractionalization for low income potentialprogram area (Department of Settlement, various).The area of each administrative unit is added to the regression. The DHS clusters are points and we do not know the size of the area surveyed. Quadratic controls forthe latitude and longitude of the centroid of each location, sublocation, and DHS cluster are also added to the regression. Standard errors are clustered at the region(columns 1 to 4) or at the province (columns 5 to 8) level. Regions in 1962 and provinces in 1989 are equivalent to level 1 (highest level) administrative boundaries.Significance levels are denoted as follows: + p<0.15, * p<0.10, ** p<0.05, *** p<0.01.
b Test of equality of coefficient estimated for % in European settler area and % in program area.c In columns 3 and 7: Test of equality of coefficient estimated for % in non-ethnic program area and % in ethnic program area. In columns 4 and 8: Test of equality of
coefficient estimated for % in program area (low income) and % in program area (high income).
When considering separately ethnic program areas and non-ethnic (i.e. program areas for
which no explicit mention of an ethnic criterion was found in archival documents) program
areas, the level of ethnic fractionalization is the same in both types of program areas pre-reform
(p-value: 0.21, column 3). When considering separately (future) high income potential program
areas and (future) low income potential program areas, the level of ethnic fractionalization is
the same in both types of program areas pre-reform (p-value: 0.89, column 4). Program areas
were hence, before the implementation of the program, ethnically diverse areas.
In 1989, after the reform was implemented, the former European settler area remains more
ethnically diverse than the former African Land Units (column 5). The program areas became
126 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
as homogeneous as the former African Land Units, which is consistent with reports from Leo
(1984): The coefficient associated with having 1% of the area under study drops from 0.39 in
1962 to 0.05 in 1989 and is not significant.48 The coefficient associated with program areas
is significantly different from the one associated with non-program areas including when we
control for future land reform or resettlement parcels (p-value: 0.01, columns 5 and 6). When
considering separately ethnic program areas and non-ethnic program areas, the level of ethnic
fractionalization is the same in both types of program area post-reform (p-value: 0.55, column
7). This finding indicates that program areas that were not necessarily classified as “ethnic”
were also allocated on an ethnic-based criterion. Both types of program areas are as homoge-
neous as the former African Land Units. When considering separately (future) high income
potential program areas and (future) low income potential program areas, the level of ethnic
fractionalization is the same in both types of program areas (p-value: 0.91, column 8). This
indicates that the ethnic-based criterion was used in both the low income potential program
areas and the high income potential program areas.
As an additional test of the association between program areas and low levels of ethnic diver-
sity, we report in column 9 results of a regression of ethnic fractionalization at cluster-level on
whether the cluster is located in a program area, in the former European settler area (but not
within the program areas), or in the former African Land Units (reference category). Results
from this regression are in line with the results from the 1989 census: the former European set-
tler areas remain more ethnically diverse than the former African Land Units and the program
areas became as homogeneous as the former African Land Units.49.
Additionally, we report results from RD plots (Figure 3.3) that show that the differences in
ethnic fractionalization are also found at the border under consideration. Figure 3.3a shows
the average cluster-level ethnic fractionalization within distance bins for clusters that are within
5km of the boundary between program areas and counterfactual areas. There is a “jump” at
the border, which indicates that sorting on ethnicity the counterfactual side is unlikely to be a
concern: It does not seem to be the case that people only bought land close to the program areas
if they were of the same ethnic group as beneficiaries of the program.50 As an additional check,
48The spatial units do not have the same size in each census, which may cause changes in coefficients estimated.For instance, the coefficient associated with non-program European settler area decreases from 0.53 to 0.37, a changethat might be due to the fact that units considered in 1989 are smaller (and hence less likely to group ethnicallyhomogeneous areas that do not have the same ethnic majority). The magnitude of the change in the non-programEuropean settler area coefficient is much smaller than the change in the program area one, indicating that the changeis not only driven by changes in the size of the units of analysis. This change in units of analysis does not affectwithin census comparisons.
49The DHS clusters in rural areas were randomly displaced by up to 5 km. As a result, some DHS households arewrongly assigned to treatment or counterfactual areas. As this built-in measurement error is random, it should notbias our results towards finding a difference between program areas and other areas.
50DHS clusters are displaced by up to 5km in rural areas (10km for 2% of clusters) and thus this measurement
3.5.1 Validity checks: Ethnicity and pre-treatment comparison 127
Figure 3.3: Ethnic fractionalization at the border (DHS)
(a) Program area/counterfactual.
0
20
40
60
80
-5 0 5Distance to border of program area (negative = untreated Scheduled Areas)
Sample average within bin Polynomial fit of order 4
RD Plot: Ethnic diversity (counterfactual)
(b) Program area/ALU.
-50
0
50
100
150
-10 -5 0 5 10Distance to border of program area (negative = former ALU)
Sample average within bin Polynomial fit of order 4
RD Plot: Ethnic diversity (ALU)
we also report results on cluster-level ethnic fractionalization for clusters that are within 5km of
the boundary between program areas and African Land Units. Figure 3.3b reports the results.
Given the small number of clusters included in this analysis, the level of ethnic diversity is
not precisely estimated. If anything, the level of ethnic diversity appears to be higher in the
former African Land Units than in the program areas, thus indicating that the program areas
are indeed less diverse than the rest of the country.
Table 3.3: Program areas vs. conterfactual – Altitude & pre-treatment characteristics
(1) (2) (3) (4) (5) (6) (7) (8)Outcomes Pop. places 1955 Facility 1955 Market 1955 Health center 1955 School 1955 Well 1955 Altitude (m) Altitude (m)
Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell) Average (in cell) Average (in cell)Data-driven bandwidth
This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomial function of the latitude and the longitudeof cell centroids). The coefficient RD Estimate captures the difference between the program areas (treatment) and the neighboring areas in the former European settler area(counterfactual). A positive coefficient on RD Estimate indicates that, for the variable considered, the difference at the discontinuity is positive, and hence higher in the programareas than in the counterfactual areas.Data source: Data on altitude extracted from Regional Centre For Mapping Resource For Development (2020). Other variables were obtained from the Kenya Gazetteers (U.S.Board on Geographic Names, 2018).Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
Altitude and pre-treatment characteristics. As shown in Table 3.3, there are no differences
between program areas and counterfactual areas when considering pre-treatment variables
(columns 1 to 6). There are no schools, wells, health centers, or markets within 5km of the
boundary studied. Populated places and as well facilities are also extremely rare. In 1962,
there were no cities close to the border under study (results available from the authors). Alti-
tude appears to be significant when we use the 5km bandwidth (column 7) but is not significant
when using a smaller, data-driven bandwidth (column 8), thus indicating that the difference in
altitude is not coming from a discontinuity in altitude at the border.
error is likely to result in a lesser discontinuity in ethnic fractionalization close to the border.
128 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
3.5.2 Effects of program
After 20 years Table 3.4 reports results from outcome variables measured in 1964 and in 1978.
There are no differences between the program areas and the counterfactual areas.
Table 3.4: Program areas vs. counterfactual – Short run outcomes
(1) (2) (3) (4) (5) (6)Outcomes Pop. places 1964 Facility 1964 Market 1964 Health center 1964 School 1964 Well 1964
Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell)
Mean DV 0.000198 0.000231 0.0000909 0 0.0000579 0s.d. DV 0.0147 0.0163 0.00954 0 0.00761 0Polynomial 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5Eff. N 65362 65362 65362 65362 65362 65362
Outcomes Pop. places 1978 Facility 1978 Market 1978 Health center 1978 School 1978 Well 1978Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell)
Mean DV 0.000248 0.000240 0.000116 0 0.000107 0s.d. DV 0.0163 0.0165 0.0108 0 0.0111 0Polynomial 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5Eff. N 65362 65362 65362 65362 65362 65362
This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomialfunction of the latitude and the longitude of cell centroids). The coefficient RD Estimate capture the difference between the pro-gram areas (treatment) and the neighboring areas in the former European settler area (counterfactual). A positive coefficient onRD Estimate indicates that, for the variable considered, the difference at the discontinuity is positive, and hence higher in the pro-gram areas than in the counterfactual areas.Data souce: All variables were obtained from the Kenya Gazetteers (U.S. Board on Geographic Names, 2018).Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denotedas follows: * p<0.10, ** p<0.05, *** p<0.01.
After 40 years Table 3.5 reports results for contemporaneous outcomes. We find that pop-
ulation density is higher in the program areas than in the counterfactual areas and that this
difference is found in the number of buildings too (columns 1 and 2, Panel A). However, the
magnitude of the difference is small: 0.12 more people per cell is equivalent to having 2 more
people per square km2. There are no differences in the number of primary and secondary
schools, in the number of students, or the number of polling stations (columns 4 to 7, Panel
A). There are no differences when using per capita measures of schools, students, and polling
stations (columns 4 to 7, panel B). This indicates that public good provision is on part with pop-
ulation levels in 2007. We find no differences in the number of schools in 1964, 1978, and 2007:
this indicates that program and counterfactual areas are on the same path regarding school
provision.51
51Positive and significant results on school provision in 1964 and 1978 would have indicated that the programareas had more schools in the short run but that the counterfactual areas caught up with them in the longer run.This is not the case.
3.5.2 Effects of program 129
Table 3.5: Program areas vs. counterfactual – Long run outcome measures
(1) (2) (3) (4) (5) (6) (7)Panel A Population Buildings Primary schools Secondary schools Students (primary) Students (secondary) Polling stations
Nb in cell Nb in cell Nb in cell Nb in cell Nb in cell Nb in cell Nb in cellRD Estimate 0.118* 0.0281*** -0.000208 -0.000132 -0.496 0.210 -0.00165
Mean DV 4.163 0.670 0.0138 0.00341 4.540 0.599 0.00723s.d. DV 3.905 0.470 0.127 0.0619 54.81 15.29 0.116Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 65362 65362 65362 65362 65362 65362 65362
Panel B Primary schools Secondary schools Students (primary) Students (secondary) Polling stationsNb per capita Nb per capita Nb per capita Nb per capita Nb per capita
Mean DV 0.909 0.00549 0.489 0.212 0.202 0.0169 0.0145s.d. DV 0.288 0.0690 0.476 0.388 0.378 0.119 0.113Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 65362 65362 65362 65362 65362 65362 65362
This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomial function ofthe latitude and the longitude of cell centroids). The coefficient RD Estimate capture the difference between the program areas (treatment) andthe neighboring areas in the former European settler area (counterfactual). A positive coefficient on RD Estimate indicates that, for the variableconsidered, the difference at the discontinuity is positive, and hence higher in the program areas than in the counterfactual areas.Data sources (Panels A & B): Population data and building data (2015) are extracted from CIESIN (2016). Number of schools and number ofstudents are extracted from the 2007 school census (Ministry of Education, 2009).The number and location of polling stations is extracted fromMaron (2013).Per capita measures (Panel B): Per capita estimates are computed using Voronoi polygons to compute population estimates associated with eachschool and polling station.Data sources (Panel C): Data on field size and crop cultivated were extracted from FAO (2000).Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denoted as follows:* p<0.10, ** p<0.05, *** p<0.01.
Income & Inequality Considering the field size measure, the cells in the program areas are
more likely to be part of small and medium fields (columns 3 and 4, panel C) and less likely
to be part of fields bigger than 5 hectares (column 5). The coefficients associated with the
likelihood that tea and coffee are grown are statistically significant but of small magnitude,
and concern crops that are unlikely to be grown close to the boundaries under study: Tea
and coffee are grown in less than 2% of the cells each. These differences are unlikely to be
driving large differences in income between program areas and counterfactual ones.52. If we
consider that field size is a proxy for farm size,53 then the difference in field size across the
border indicates that the households living in the program areas might be, on average, worse-
off than the households living in the counterfactual areas. Additionally, given that about half
of the cells are located in small fields, it seems that the program areas are less unequal than the
counterfactual areas, as cells in the program area are less likely to be large cells.
52These differences in the likelihood to grow tea and coffee are not explained by differences in field size.53While field size does not equal farm size, we found a strong positive correlation using data from the 2015/2016
Kenya Integrated Household Budget Survey (Kenya National Bureau of Statistics, 2015/2016): the correlation coef-ficient between farm size and field size is 0.74.
130 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
Spillovers & catchment area of schools Another issue in interpreting the results is that there
are potential spillovers from program areas to the counterfactual areas. The main spillover is
whether schools on the program side serve children on both sides of the border. As there are
no differences in the number of schools, this does not seem to be the case. Per capita measures
take into account the fact that catchment areas of schools (measured by Voronoi polygons) do
not necessarily stop at the program area border.
Table 3.6: Program areas vs. conterfactual – Sponsor of schools
Schools located within 5km of the boundary under study (estimation sample).The difference in the number of schools is explained by the difference in the num-ber of cells in the program areas and in the counterfactual areas.a Column (3) reports the results of a t-test of differences in means. Significancelevels are denoted as follows: + p<0.15, * p<0.10, ** p<0.05, *** p<0.01.
Quality of schools (sponsors) We provide results on the sponsor of each school in 2007. Ta-
ble 3.6 presents a description of the sponsor of schools included in the estimated sample (schools
located within 5km of the border under study). As the number of schools is the same across
the boundary (and across time), the descriptive statistics in percentage should be interpreted
as substitution effects within types of schools.54 Primary schools in the program areas are more
54Our data points on schools are 1964, 1978, and 2007. There might have been differences in the number of schoolsin the 1980s and 1990s that we are not capturing. The difference in the share of school sponsors between treatment
3.5.2 Effects of program 131
likely by 7.5 percentage points to be private religious schools than schools in the counterfactual
area are. Primary schools in the counterfactual areas are more likely to be government schools
or to be funded by private individual organizations or by NGOs than schools in the program
area are. Regarding secondary schools, schools located in the program areas are more likely to
be financed by the central government and by the community, and less likely to be financed
by private individuals. As harambee schools have slowly been integrated into the formal school
system, it would be interesting to know whether the government schools were initially com-
munity schools. While the typical harambee school was of lower quality than other types of
schools, without additional information we cannot conclude regarding the quality of schools.
These differences in the school sponsor might reflect differences in the cost of providing schools
depending on the level of ethnic diversity (i.e. religious institutions may find it less costly to
provide schools in homogeneous areas than in diverse ones).
Border segment analysis: Income potential of program areas We now consider two bound-
ary categories: i) the boundary between low income potential program and neighboring coun-
terfactual areas and ii) the boundary between high income potential program and neighboring
counterfactual areas (Map A-3.1 depicts the potential income of program areas, as reported in
the archival documents.).55
Panel A of Table 3.7 presents the results for low potential income program areas. These areas
are not different from the counterfactual areas, except for the number of buildings (Panel A1,
column 2). The results on field size are similar to the ones on the pooled sample (Table 3.5).
Panel B of Table 3.7 present the results for high potential income program areas. These areas are
more populated than the neighboring counterfactual area. This positive population difference
is unexpected, given that high income potential program areas were initially program areas
in which the population density was meant to be lower.56 The number of secondary schools
and secondary school students (per capita) is higher in these areas. This finding could indicate
that, in higher income potential program areas, people were more successful and built more
schools. We cannot know whether the positive coefficient on the number of secondary school
students per capita is due to stronger preferences for education in the program area (demand
side) or if the schools now have a larger catchment area and more students. The results on field
size are similar to the ones on the pooled sample (Table 3.5). Both the high income potential
and control areas is in line with the idea that program areas might have financed more community-based religiousschools. The central government and NGOs would then have stepped in to provide schools in counterfactual areas.We plan to collect school censuses from these missing decades to test for this hypothesis.
55Belong to a high income potential program area rather than a low income potential area is not random: it islikely that there are differences in soil quality between these program areas. However, if soil quality varies smoothlyat the border, both these program areas can be compared to their respective neighboring areas.
56Program beneficiaries may have needed to hire more laborers which would drive the population estimates up.
132 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
Table 3.7: Program areas vs. conterfactual – Border segment analysis (income potential)
Panel A1 Population Buildings Primary schools Secondary schools Students (primary) Students (secondary) Polling stationsNb in cell Nb in cell Nb per capita Nb per capita Nb per capita Nb per capita Nb per capita
Mean DV 0.895 0.00576 0.502 0.206 0.181 0.00778 0.0168s.d. DV 0.306 0.0713 0.476 0.384 0.361 0.0807 0.121Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 53679 53679 53679 53679 53679 53679 53679
Panel B: High potential income ≥ 100 £/year
Panel B1 Population Buildings Primary schools Secondary schools Students (primary) Students (secondary) Polling stationsNb in cell Nb in cell Nb per capita Nb per capita Nb per capita Nb per capita Nb per capita
Mean DV 0.990 0.00395 0.416 0.251 0.319 0.0574 0s.d. DV 0.0981 0.0532 0.468 0.413 0.440 0.214 0Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 11301 11301 11301 11301 11301 11301 11301
This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomial function ofthe latitude and the longitude of cell centroids). The coefficient RD Estimate capture the difference between the program areas (treatment) andthe neighboring areas in the former European settler area (counterfactual). A positive coefficient on RD Estimate indicates that, for the variableconsidered, the difference at the discontinuity is positive, and hence higher in the program areas than in the counterfactual areas.Data sources : Population data and building data (2015) are extracted from CIESIN (2016). Number of schools and number of students areextracted from the 2007 school census (Ministry of Education, 2009). The number and location of polling stations is extracted from Maron (2013).Per capita estimates are computed using Voronoi polygons to compute population estimates associated with each school and polling station.Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denoted as follows:* p<0.10, ** p<0.05, *** p<0.01.
area and the low income potential program area appear to be worse off than their respective
neighboring areas. It is worth noting that the cells close to the high income potential border
are more likely to belong to large fields (32% of cells in the estimation sample of panel B) than
cells close to the low income potential border (18% of cells in the estimation sample of panel
A) and the magnitude of the coefficient associated to large fields is smaller in panel B than
in panel A. These findings indicate that the households living on high income program areas
are likely to be better off than the households living on low income program areas, both in
absolute terms and when compared to the counterfactual areas. This finding also holds when
looking at descriptive statistics on field size in high and low income potential program areas
(see Table A-3.2 in Appendix).
3.6 Discussion 133
Border segment analysis: Ethnic majority We investigate whether the results are driven by
differences across ethnic groups (defined as the majority ethnic group associated with each
border segment) that would be averaged out in the main specification. This is not the case:
the results are robust to including ethnic segment fixed effects (Table A-3.3, in Appendix).57
We also present the results when considering separately border segments according to the ma-
jority ethnic group each segment is associated to (Table A-3.4, in Appendix). The number of
schools and polling stations remain insignificant when considering separately ethnic boundary
segments associated with Kikuyu, Kalenjin, and Luhya.58 The number of primary school stu-
dents per capita is weakly significant and negative in Kikuyu program areas. We also report
the level of school provision in the program areas associated with these three ethnic groups (Ta-
ble A-3.5, in Appendix). There are no significant differences in per capita number of primary
schools and primary school students across these program areas: the difference in the number
of primary schools between Kalenjin and Luhya program areas is not found in the number of
primary school students, indicating that primary school provision is on par with population
levels. The per capita numbers of secondary schools and secondary school students is signifi-
cantly higher in Kikuyu program areas than in Kalenjin and Luhya program areas. This result
is in line with results from Simson and Green (2020) that show that Kikuyu individuals are
more likely to have attended secondary school.
3.6 Discussion
The results presented above are to be interpreted as the effects of the program under study. In
this section, we discuss possible interpretations of these results in light of theories related to
ethnic fractionalization and public good provision. We then discuss whether the effects could
be explained by the direct effects of the land reform itself. Information on school sponsors is
difficult to interpret in the absence of earlier data points. In this section, we focus on the results
on the number of schools and not the results on school sponsors.
3.6.1 Ethnicity and school provision
Community-level explanations
We can think of four channels linking ethnic homogeneity to higher levels of public good pro-
vision that are possibly relevant to the areas we study: a preference for mixing with co-ethnics;
57The boundary lines are divided into 10 meter long segments. Each boundary segment is then assigned to theethnic group that is the largest in the location the segment is located in, using the ethnicity data from the 1989census.
58We report results for these three ethnic groups as they correspond to the ethnic boundary segments associatedwith a high number of cells.
134 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
similar preferences over the type of public good to be funded; homogeneous returns to invest-
ments across ethnic lines; social sanctions being easier to enforce.59 These channels would let
us expect higher levels of public good provision in the program areas compared to the coun-
terfactual areas. Yet we do not find such a difference in 2007. Why not?
Resettlement on both sides of the border Network density A possible explanation to the null
result that we find is that we compare two areas in which resettlement took place: the loss of
social ties may have made cooperation more difficult and social sanctions less enforceable on
both sides of the border, thus resulting in similar levels of school provision on both sides of
the border. This interpretation is consistent with the idea that the level of ethnic diversity in
fact proxies for the strength of social networks within communities (Eubank et al. (2019) for an
empirical test of this hypothesis, first mentioned in Miguel and Gugerty (2005) and Fearon and
Laitin (1996)). Dense social networks allow communities to share information and hold politi-
cians accountable for the delivery of public goods. Newly resettled individuals are probably
less likely to share such a dense social network: this could make social sanctions or collective
action difficult to implement. As individuals on both the treatment and the counterfactual side
lack this dense social network, independent of the ethnic diversity level of their community,
we do not find an effect.
Self-section on willingness/ability to cooperate with strangers As people voluntarily moved to the
former European settler area, it is possible that they self-select according to their willingness
and ability to cooperate with strangers in general, thus resulting in no difference in public good
provision.
Sorting on the counterfactual side (at program implementation date) As our main result is that the
program area and the counterfactual area do not have different levels of school provision, we
are concerned by the fact that public good provision could be higher on the counterfactual than
”predicted” from the level of ethnic fractionalization because of sorting in the counterfactual
area on preference for ethnic diversity. Individuals who have a higher preference for ethnic
diversity would be more likely to participate in community activities (Alesina and La Ferrara,
2000). Sorting on similar preferences for public good provision level would also result in addi-
tional cooperation (Miguel and Gugerty, 2005).60 However, it is unlikely that individuals who
sought to buy land in the former European settler areas could anticipate what the ethnic com-
position of the area would be, as individuals and families were mostly trying to secure access to
59See Miguel and Gugerty (2005) for a review of theories of ethnic diversity and collective action.60As the preferences of public good provision varies within ethnic groups (Alesina et al., 1999), sorting on prefer-
ences for public goods does not imply sorting across ethnic lines.
3.6.2 Alternative explanations 135
land and that the ethnic composition of the counterfactual area was not known ex ante. Tiebout
sorting ex post is similarly unlikely in a context, rural Kenya, where residential mobility is low
in rural Kenya.61
Political economy: Allocation of schools by government
A positive effect of ethnic homogenization on public good provision could have been balanced
by higher levels of government investments in education in the counterfactual areas. Two types
of political economy channels could explain why the government could have provided more
schools to counterfactual areas. To secure votes, governments could have targeted the provi-
sion of schools in non-program areas either as compensation to households that had not been
able to benefit from the reform or as an investment in more diverse communities in which elec-
tions are likely to be more contested if there is ethnic voting (Hassan, 2017; Horowitz, 2019).62
This interpretation could be supported by results on primary school sponsors but does not hold
for secondary schools. Further developments of this paper will aim to understand the differ-
ence in school providers across treatment and counterfactual areas as well as the differences
between secondary and primary school results. It may be that the mechanisms that drive the
provision of primary schools are not the same as the ones that drive the provision of secondary
schools.
3.6.2 Alternative explanations
To interpret the results in terms of ethnic homogenization two additional assumptions are
needed. The first is that there was no direct impact of land reform on the outcomes we are
considering. The second is that individuals on both sides of the border were selected similarly.
We discuss these assumptions below. An additional channel would be that only one area has
convenient access to polling stations (and thus the ability to support or sanction the govern-
ment), however, as there are no differences in the number of polling stations (in absolute terms
and per capita) we can rule this explanation out.
Direct impact of the land reform program The land reform program could have affected the
provision of schools either through income effects or through redistribution effects (inequal-
ity). Regarding the former, if the border is indeed random at the local level, we do not expect
61Rhode and Strumpf (2003) argue that long-run trends in geographic segregation are inconsistent with Tieboutsorting models (Tiebout, 1956) where residential choice depends solely on local public goods.
62However public good provision is a costly strategy and to the best of our knowledge there is no evidence ofsuch targeting of public goods to buy votes. Harris and Posner (2019) argue, using Kenya as a case study, thatdevelopment projects (including school-related projects) are more allocated according to needs than as a reward topolitical supporters.
136 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
differences in soil quality. To the best of our knowledge, the resolution of the available soil
quality data does not allow us to test that soil quality does not vary at the border.63 We argue
that the economic benefits of the reforms were limited by the fact that the land had to be bought
by the beneficiaries, and was not given to them. As the participation in the program was vol-
untary and that the land was bought, then we can assume that the expectation was not that
the benefits would be large. (Field sizes are smaller, but if the loans have not been fully reim-
bursed, the beneficiaries would have been able to invest more money in their farm.) This direct
positive income effect would be more consistent with higher levels of public provision on all
types of program areas but does not explain the null result we find. Similarly, the reduction in
inequality would be associated with an increase in public good provision.
Finally, the results using the field size suggest that households in program areas might be
poorer than their counterparts. This could indicate a negative income effect that could coun-
teract the (theoretically) positive effect of ethnic homogenization and hence explain our results.
To explore this hypothesis, we conduct a mediation analysis (Table 3.8). The results on schools
and polling stations remain insignificant when controlling for field size which indicates that a
negative income effect is not what is driving the null result we observe.
Selection of individuals/beneficiaries (income effects) We do not have much information on
the individuals selected to be program area beneficiaries, but evidence from Leo (1984, p. 82)
suggests that the selection on income and ability was stronger on high income potential areas
than in low income potential areas. We do find more students attending school in high income
potential program areas compared to the counterfactual areas. This is consistent with income
effects coming from the ability of beneficiaries (skills and ability to make investments), which
would be consistent with the result on field size. These more skilled beneficiaries are likely to
also have a higher demand for education, which could explain the higher number of students
per capita in these areas.
An alternative explanation is that households that settled on the counterfactual areas have
been positively selected regarding income and skill compared to their counterparts in the low
income potential program areas. The intuition is that these households might have to pay more
upfront to buy land plots (or convince a bank to loan them money, which might have required
more credentials than the inclusion in the government-run land redistribution program). How-
ever, these areas do not have more community schools than the program areas: this channel
thus seems unlikely to have been at work.
63Available data on crop suitability (FAO/IIASA, 2011) is at the 5km by 5km resolution in Kenya.
3.7 Extensions 137
Table 3.8: Program areas vs. counterfactual – Mediation analysis: Field size and long runoutcome measures
(1) (2) (3) (4) (5)Panel A Nb primary Nb secondary Nb students Nb students Nb polling
Mean DV 0.0000119 0.000000798 0.00323 0.000132 0.00000307s.d. DV 0.000145 0.0000176 0.0452 0.00401 0.0000712Polynomial 1 1 1 1 1Bandwidth 5 5 5 5 5Eff. N 65362 65362 65362 65362 65362
This table reports the results from a regression discontinuity specification. Location controls areincluded (third-order polynomial function of the latitude and the longitude of cell centroids).The coefficient RD Estimate capture the difference between the program areas (treatment) andthe neighboring areas in the former European settler area (counterfactual). A positive coefficienton RD Estimate indicates that, for the variable considered, the difference at the discontinuity ispositive, and hence higher in the program areas than in the counterfactual areas.Data sources (Panels A & B): Population data and building data (2015) are extracted from CIESIN(2016). Number of schools and number of students are extracted from the 2007 school cen-sus (Ministry of Education, 2009).The number and location of polling stations is extracted fromMaron (2013).Per capita measures (Panel B): Per capita estimates are computed using Voronoi polygons to com-pute population estimates associated with each school and polling station.Data sources (field size): Data on field size were extracted from FAO (2000).The regression include field size fixed effects (small, medium, large). Standard errors are esti-mated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levelsare denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
3.7 Extensions
3.7.1 Other comparisons
Motivation – European settler area boundary We argued that the boundary between African
Land Units and the European settler areas was plausibly random at the local level. Indeed, as
Figure 3.1 shows, the African Land Units borders often followed straight lines. However, while
138 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
adjacent, the two areas differed at the time of the implementation of the land reform in many
respects. African Land Units were governed by customary land tenure. Africans could not
expand agricultural holdings into the European settler areas. As a consequence, population
growth led to very small landholdings. Moreover, the colonial government neglected these ar-
eas in the provision of public goods (Eynde vanden et al., 2018). Hence, a comparison between
African Land Units and program areas would not allow for a causal interpretation of the effect
of the program.64
Nevertheless, the comparison is interesting in its own right. The program areas extended the
regional base of ethnic groups across the Reserves (Leo, 1984, p. 120-4), so the two areas became
similar in ethnic dominance. Moreover, it has been argued that customary law spilled over to
the program areas. It is conceivable that the situation in the African land Units represents the
equilibrium and program areas 50 years on converged to the situation in African Land Units.
We can test this hypothesis by applying the same methodology to the border between program
areas and African Land Units. If true, we would find pre-treatment differences that become
smaller and disappear over time.
A natural cross-check is whether we find any differences between African Land Units and
Scheduled Areas. This serves as an external validity check. Naturally, the effect should be the
sum from the two effects above.65 Because the boundaries under study are different from the
ones used for the two other comparisons this may not be the case.
Program areas – African Land Units
Altitude and pre-treatment characteristics. As shown in Table A-3.6, there are no differences
between program areas and African Land Units, at the border, when considering pre-treatment
variables (columns 1 to 6). In 1962, there were no cities close to the border under study (results
available from the authors). Altitude appears to be significant when we use the 5km bandwidth
(column 7) but is not significant when using a smaller, data-driven bandwidth (column 8), thus
indicating that the difference in altitude is not coming from a discontinuity in altitude at the
border.
Effects of program – After 20 years Table A-3.7 reports results from outcome variables mea-
sured in 1964 and in 1978. In 1964, the program areas were less likely to have, at the border,
64Of course, if there were indeed no differences at the local level between African Land Units and Europeansettler areas, we may interpret any difference observed thereafter as being causally affected by “being included inthe European settler areas”. However, we would not be able to pin down the actual treatment.
65(counterfactual - African Land Unit) = (counterfactual - program area) + (program area - African Land Unit).Note that the results reported in section 3.5.2 are (program area - counterfactual).
3.7.1 Other comparisons 139
populated places, markets, and schools. This finding is consistent with the fact that the former
African Land Units had been more populated and populated for longer. The level of public
good provision is likely to be on par with the population (proxied by the number of populated
places). In 1978, the only significant difference is the number of government facilities, which
was still higher in the former African Land Units than in the program areas. This finding is
consistent with the idea that program areas are catching up over time with the former African
Land Units when we consider variables associated with the length of the settlement.
Effects of program – After 40 years/Contemporaneous outcomes Table A-3.8 reports results
for contemporaneous outcomes. We find that population density is lower in the program areas
than in the former African Land Units and that this difference is also found in the number of
buildings (columns 1 and 2, Panel A). This is consistent with the fact that the African Land
Units were more populated than the European settler area during the colonial period, and
seem to have remained so. There are no differences in the number of primary and secondary
schools, in the number of students, or the number of polling stations (columns 4 to 7, Panel
A). There are no differences when using per capita measures of schools, students, and polling
stations (columns 4 to 7, panel B). When considering field size (Panel C), the program areas
have, on average, fewer small fields (column 3) and more large fields (column 5). Tea is more
likely to be grown in the program area. Coffee is not grown in this part of the country. If we
consider that field size is a proxy for farm size, then the difference in field size across the border
indicates that the households living in the program areas might be, on average, better off than
the households living in the former African Land Units. Additionally, given that about 70% of
the cells are located in small fields, it seems that the program areas are more unequal than the
former African Land Units, as cells in the program area are more likely to be large cells.
Border segment analysis: Income potential of program areas We now consider two bound-
ary categories: i) the boundary between low income potential program and neighboring coun-
terfactual areas and ii) the boundary between high income potential program and neighboring
counterfactual areas (Map A-3.1 depicts the potential income of program areas, as reported in
the archival documents.)
Panel A of Table A-3.9 presents the results for low potential income program areas. These ar-
eas are not different from the African Land Units, except that the population is lower, but the
magnitude of the difference is small: 0.62 fewer people per cell results in 10 fewer people per
square km2 (Panel A1, column 1). There are no differences when we consider field size. Panel
B of Table A-3.9 present the results for high potential income program areas. These areas are
140 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
much less populated than the neighboring African Land Units, which indicates some persis-
tence over time, as the high income potential program areas were meant to be low population
density program areas: 2.75 fewer people per cell results in 44 more people per square km2.
When considering field size (Panel B2), the high income potential program areas have, on aver-
age, fewer small fields (column 3) and more large fields (column 5) as well as more tea-growing
fields.66. The average positive effect observed on the pooled sample is hence driven entirely
by the difference between the high income potential areas and the neighboring African Land
Units. This difference is likely to come from the African Land Units side (as the cells in the
estimation sample of panel B are much more likely to be in a small field than the cells in the
estimation sample of panel A). For both types of program areas, the provision of schools and
polling stations seems on par with population levels.
Interpretation We cannot attribute the effects observed to the colonial period (1901-1963) or
the post-colonial period. It is likely that the three areas considered – African Land Units, high
income potential program areas, low income potential areas – started from different levels of
population and public goods in 1963. However, it seems like the situation in the low income
potential program areas did converge to the one in the former African Land Units. The results
on the high income potential program areas suggest that these places are better off than the
former African Land Units. As these areas were selected, this success may be due to differences
in the program (notably initial plot size and selection of beneficiaries) or in soil quality. The
situation in these program areas has not converged to the one in the former African Land Units,
but as we have little data on plot or farm size over time, we cannot know whether the program
areas are in the process of converging to the situation in the former African Land Units, notably
with plots of land being subdivided at the time of inheritance.
Non-program European settler areas – African Land Units
Results As shown in Table A-3.10, the only difference across the boundary of the former
Scheduled Area, when considering pre-treatment variables and altitude is the number of pop-
ulated places, which was slightly higher in the former European settler area than in the African
Land Units. In 1964, there were more populated places, more markets, and more schools in the
African Land Units than in the former European settler area, a result that is the same as the one
observed when comparing the program areas to the African Land Units (Table A-3.11), which
indicates that the whole former European area had less of these types of facilities than the for-
mer African Land Units. In 1978, the former European settler area still had fewer schools than
66This effect is entirely driven by the high income potential program area in the Kisii region (South-West of Kenya)
3.7.2 Additional robustness tests 141
the former African Land Units. When considering contemporaneous outcomes (Table A-3.12),
we find that the non-program areas of the former European settler areas are still less populated
and have fewer buildings (columns 1 and 2, Panel A), a lower number of schools and students,
and a lower number of polling stations. These differences mostly spring from differences in
population level: the only difference that remains significant when considering per capita esti-
mates is the number of polling stations per capita (column 7, Panel B). Cells within the former
European settler area are less likely to be small and medium fields, and more likely to be large
fields, which corresponds to what can be expected, given that the African Land Units were
overpopulated and that plots were smaller there.
Comparison with results from other border comparisons We compare the estimate coeffi-
cients of Panel A (Table A-3.12) to the predicted coefficients based on Panel A of Table 3.5 and
Table A-3.8. The signs of the predicted and of the estimated coefficients on population and
buildings are the same. The estimated coefficients on the number of secondary schools, the
number of students for both types of schools, and the number of polling stations also have the
same sign as the predicted coefficient. The estimated and predicted coefficients are only dif-
ferent for the number of primary schools. The results hence go in the same direction whether
we use a simple difference (counterfactual vs African Land Units) or a double difference. This
finding suggests that there are no large spillovers from program areas onto neighboring ar-
eas. If there were large spillovers the double-difference results (spillovers included) would be
different from the simple-difference results (no spillovers).
3.7.2 Additional robustness tests
We test whether our main results are sensitive to bandwidth selection (and to the associated
specification error) by reporting regression results that follow the misspecification error correc-
tion procedure and data-driven bandwidth selection (Calonico et al., 2017). Table A-3.13 reports
the results for the main outcomes. The results on population and number of buildings carry
through and no difference is found when considering primary or secondary school outcomes.
3.8 Conclusion
Using a spatial regression discontinuity design, we study the short-run and long-run effects
of a land reform program that aimed to extend Kenya’s ethnic enclaves. We find a strong
discontinuity in ethnic diversity but no differences in school provision both in absolute and
in per capita terms between program areas and counterfactual areas. The land reform had a
lasting impact on field size, which we interpret as indicative of income and inequality effects.
142 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
These effects could potentially cancel or reinforce any effect from ethnic homogenization. A
mediation analysis, however, suggests that this is unlikely. Moreover, we studied two other
borders: i) the border between the program areas and African Land Units and ii) the border
between non-program European settler area and the African Land units. The results from this
first additional comparison suggest that the low income program areas tend to converge to the
situation in the African Land Units. Double differencing results compared to results from the
non-program border make spatial spillovers appear unlikely.
The results of the paper point to several next steps that should be taken. The difference between
program and counterfactual areas in school sponsors indicates that there might be differences
that our current measure (number of schools) is not capturing. Program areas might have
financed more community-based religious schools. As a result, the central government and
NGOs might have stepped in to provide schools in counterfactual areas. There might also be
differences in the number of schools in the 1980s and 1990s that we are not capturing (georef-
erenced data on schools is available only in 1964, 1978, and 2007, with the latter only being a
census). Hence, we plan to collect earlier school censuses that would allow us to study these
hypotheses. We also did not study school quality across the border. The available 2007 census
data contains some information regarding quality (e.g. available facilities, student to staff ratio)
that we plan to use to test whether sponsors indeed correlate with school quality.
Investigating further the timing of the settlement of new communities onto the former Euro-
pean settler areas could help to extend the discussion on sorting.
We will also do more to strengthen our claim of a quasi-random border. From archival docu-
ments held by the British National Archives, we digitized the plot boundaries before the land
reform was implemented.67 Using these maps, we will provide some information on the plots
that were included in the program areas and those that were not (plot size, access to a road,
date of inclusion in the program). We expect that these findings will confirm that the border of
the program areas is as good as random at the local level.
Our results may be specific regarding schooling and not carry forward to other types of public
goods. We, therefore, plan to include other georeferenced measures of public good provision
(e.g. roads, wells). To further strengthen the interpretation that a lack of group measure on
both sides of the border may explain our result, we may use information asked in the various
Afrobarometer surveys.
67FCO141-6917 contains maps of the areas around Kitale, Kisii, Ol’Kalou, Machakos. FCO141-6927 and FCO141-19109 covered the Nyarandua schemes. Plot boundaries are only available for farms in the European settler areas.
3.8 Conclusion 143
Another possible extension would be to use electoral results to test whether program areas vote
more for the incumbent party (constituencies are larger than program areas but we could run
the same type of analysis as for ethnic diversity). Future versions of this paper should discuss
alternatives to the ethnolinguistic fractionalization (polarization measures, Politically Relevant
Ethnic Groups (PREG) measure Posner (2004)).
144 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
Cells located in program areas, within 5km of the boundaries under study(boundaries to counterfactual areas and to former African Land Units).a Column (3) reports the results of a t-test of differences in means. Significancelevels are denoted as follows: + p<0.15, * p<0.10, ** p<0.05, *** p<0.01.
Appendix 147
Table A-3.3: Program areas vs. counterfactual – Long run outcome measures (Ethnicityboundary FE controls)
(1) (2) (3) (4) (5) (6) (7)Outcome Population Buildings Nb primary Nb secondary Nb students Nb students Nb polling
Mean DV 0.909 0.00549 0.489 0.212 0.202 0.0169 0.0145s.d. DV 0.288 0.0690 0.476 0.388 0.378 0.119 0.113Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 64289 64289 64289 64289 64289 64289 64289
This table reports the results from a regression discontinuity specification. Location controls are included (third-orderpolynomial function of the latitude and the longitude of cell centroids). The coefficient RD Estimate capture the dif-ference between the program areas (treatment) and the neighboring areas in the former European settler area (counter-factual). A positive coefficient on RD Estimate indicates that, for the variable considered, the difference at the discon-tinuity is positive, and hence higher in the program areas than in the counterfactual areas.Data sources (Panels A & B): Population data and building data (2015) are extracted from CIESIN (2016). Number ofschools and number of students are extracted from the 2007 school census (Ministry of Education, 2009).The numberand location of polling stations is extracted from Maron (2013).Per capita measures (Panel B): Per capita estimates are computed using Voronoi polygons to compute population esti-mates associated with each school and polling station.Data sources (Panel C): Data on field size and crop cultivated were extracted from FAO (2000).a The boundary lines are divided in 10 meter long segments. Each boundary segment is then assigned to the ethnicgroup that is the largest in the location the segment is located in, using the ethnicity data from the 1989 census.The regression include ethnicity fixed effect. Standard errors are estimated using a heteroskedasticity-robust nearestneighbor variance estimator. Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
148 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
Table A-3.4: Program areas vs. conterfactual – Border segment analysis (ethnic majority)
(1) (2) (3) (4) (5) (6) (7)
Kikuyu Population Buildings Primary schools Secondary schools Students (primary) Students (secondary) Polling stationsborder segments Nb in cell Nb in cell Nb per capita Nb per capita Nb per capita Nb per capita Nb per capita
Mean DV 3.451 0.645 0.0000132 0.00000145 0.00323 0.000290 0.00000334s.d. DV 3.418 0.479 0.000187 0.0000265 0.0403 0.00714 0.0000693Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 15092 15092 15092 15092 15092 15092 15092
Kikuyu Cell in Non-agri Small field Medium field Large field Tea Coffeeborder segments field field (<2 ha) (2-5 ha) (≥ 5 ha)RD Estimate 0.0477*** -0.0187*** 0.140*** -0.0377*** -0.0356*** 0.00182*** 0.0197***
Mean DV 0.876 0.0124 0.684 0.109 0.0703 0.00186 0.0516s.d. DV 0.329 0.106 0.439 0.286 0.237 0.0377 0.210Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 15092 15092 15092 15092 15092 15092 15092
Kalenjin Population Buildings Primary schools Secondary schools Students (primary) Students (secondary) Polling stationsborder segments Nb in cell Nb in cell Nb per capita Nb per capita Nb per capita Nb per capita Nb per capita
Mean DV 4.513 0.712 0.0000127 0.000000576 0.00301 0.0000694 0.00000325s.d. DV 4.442 0.453 0.000136 0.0000133 0.0350 0.00194 0.0000600Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 28199 28199 28199 28199 28199 28199 28199
Kalenjin Cell in Non-agri Small field Medium field Large field Tea Coffeeborder segments field field (<2 ha) (2-5 ha) (≥ 5 ha)RD Estimate -0.000430 0 0.00704 0.103*** -0.111*** 0.0000507 0
Mean DV 0.993 0 0.314 0.374 0.305 0.000129 0s.d. DV 0.0818 0 0.434 0.467 0.434 0.00985 0Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 13366 13366 13366 13366 13366 13366 13366
Luhya Population Buildings Primary schools Secondary schools Students (primary) Students (secondary) Polling stationsborder segments Nb in cell Nb in cell Nb per capita Nb per capita Nb per capita Nb per capita Nb per capita
Mean DV 5.541 0.805 0.0000105 0.000000689 0.00399 0.000117 0.00000207s.d. DV 2.910 0.396 0.000104 0.0000120 0.0494 0.00265 0.0000324Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 13366 13366 13366 13366 13366 13366 13366
Luhya Cell in Non-agri Small field Medium field Large field Tea Coffeeborder segments field field (<2 ha) (2-5 ha) (≥ 5 ha)RD Estimate -0.00631*** 0.0000130* 0.120*** 0.116*** -0.242*** -0.0204*** 0
Mean DV 0.974 0.00152 0.515 0.218 0.240 0.0387 0s.d. DV 0.159 0.0343 0.472 0.386 0.398 0.178 0Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 28199 28199 28199 28199 28199 28199 28199
This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomial function of the latitudeand the longitude of cell centroids). The coefficient RD Estimate capture the difference between the program areas (treatment) and the neighboring areasin the former European settler area (counterfactual). A positive coefficient on RD Estimate indicates that, for the variable considered, the difference at thediscontinuity is positive, and hence higher in the program areas than in the counterfactual areas.Data sources : Population data and building data (2015) are extracted from CIESIN (2016). Number of schools and number of students are extracted fromthe 2007 school census (Ministry of Education, 2009). The number and location of polling stations is extracted from Maron (2013). Per capita estimates arecomputed using Voronoi polygons to compute population estimates associated with each school and polling station.Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denoted as follows: * p<0.10, **p<0.05, *** p<0.01.
Appendix 149
Table A-3.5: Descriptive statistics: Population, schools, and students by ethnic majority
program areas program areas Kalenjin-Kikuyua program area Luhya-Kikuyua Kalenjin-Luhyaa
Average population (by cell) 3.8016485 4.7233672 0.9217188*** 5.3424799 1.5408314*** 0.6191126***Avg nb of primary schools (per capita) 0.0000126 0.0000130 0.0000004 0.0000098 -0.0000028 -0.0000032*Avg nb of primary school students (per capita) 0.0029468 0.0030429 0.0000961 0.0040349 0.0010881 0.0009920Avg nb of secondary schools (per capita) 0.0000012 0.0000005 -0.0000007** 0.0000006 -0.0000006** 0.0000001Avg nb of secondary school students (per capita) 0.0002034 0.0000734 -0.0001300** 0.0000885 -0.0001149* 0.0000151
Number of cells 5240 9776 15016 6463 11703 16239
All cells included in a program area and that are close to a border segment coded either as Kikuyu, Kalenjin or Luhya. (The boundary lines are divided in10 meter long segments. Each boundary segment is then assigned to the ethnic group that is the largest in the location the segment is located in, using theethnicity data from the 1989 census.)a Columns (3), (5), and (6) reports the results of a t-test of differences in means. Significance levels are denoted as follows: + p<0.15, * p<0.10, ** p<0.05, ***p<0.01.
Table A-3.6: Program areas vs. African Land Units – Altitude & pre-treatment characteristics
(1) (2) (3) (4) (5) (6) (7) (8)Outcomes Pop. places 1955 Facility 1955 Market 1955 Health center 1955 School 1955 Well 1955 Altitude (m) Altitude (m)
Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell) Average (in cell) Average (in cell)Data-driven bandwidth
This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomial function of the latitude and the longitude ofcell centroids). The coefficient RD Estimate captures the difference between the program areas (treatment) and the former African Land Units (control). A positive coefficienton RD Estimate indicates that, for the variable considered, the difference at the discontinuity is positive, and hence higher in the program areas than in the former African LandUnits.Data source: Data on altitude extracted from Regional Centre For Mapping Resource For Development (2020). Other variables were obtained from the Kenya Gazetteers (U.S.Board on Geographic Names, 2018).Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
Table A-3.7: Program areas vs. African Land Units – Short run outcomes
(1) (2) (3) (4) (5) (6)Outcomes Pop. places 1964 Facility 1964 Market 1964 Health center 1964 School 1964 Well 1964
Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell)
Mean DV 0.000205 0.000863 0.000616 0 0.00279 0s.d. DV 0.0143 0.0294 0.0248 0 0.0551 0Polynomial 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5Eff. N 9720 9720 9720 9720 9720 9720
Outcomes Pop. places 1978 Facility 1978 Market 1978 Health center 1978 School 1978 Well 1978Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell)
Mean DV 0.000164 0.000986 0.000699 0.0000411 0.00275 0s.d. DV 0.0128 0.0314 0.0279 0.00641 0.0547 0Polynomial 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5Eff. N 9720 9720 9720 9720 9720 9720
This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomialfunction of the latitude and the longitude of cell centroids). The coefficient RD Estimate capture the difference between the programareas (treatment) and the former African Land Units (control). A positive coefficient on RD Estimate indicates that, for the variableconsidered, the difference at the discontinuity is positive, and hence higher in the program areas than in the former African LandUnits.Data souce: All variables were obtained from the Kenya Gazetteers (U.S. Board on Geographic Names, 2018).Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denotedas follows: * p<0.10, ** p<0.05, *** p<0.01.
150 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
Table A-3.8: Program areas vs. African Land Units – Long run outcome measures
(1) (2) (3) (4) (5) (6) (7)Panel A Population Buildings Primary schools Secondary schools Students (primary) Students (secondary) Polling stations
Nb in cell Nb in cell Nb in cell Nb in cell Nb in cell Nb in cell Nb in cellRD Estimate -0.848*** -0.160*** 0.00142 -0.000250 -2.970 -1.539 -0.00332
Mean DV 6.981 0.830 0.0265 0.00773 8.866 1.575 0.0150s.d. DV 5.044 0.376 0.172 0.0921 71.42 25.28 0.217Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 9720 9720 9720 9720 9720 9720 9720
Panel B Primary schools Secondary schools Students (primary) Students (secondary) Polling stationsNb per capita Nb per capita Nb per capita Nb per capita Nb per capita
Mean DV 0.992 0.00318 0.722 0.102 0.164 0.0779 0s.d. DV 0.0905 0.0503 0.424 0.284 0.351 0.250 0Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 9720 9720 9720 9720 9720 9720 9720
This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomial function ofthe latitude and the longitude of cell centroids). The coefficient RD Estimate capture the difference between the program areas (treatment) andthe former African Land Units (control). A positive coefficient on RD Estimate indicates that, for the variable considered, the difference at thediscontinuity is positive, and hence higher in the program areas than in the former African Land Units.Data sources (Panels A & B): Population data and building data (2015) are extracted from CIESIN (2016). Number of schools and number ofstudents are extracted from the 2007 school census (Ministry of Education, 2009). The number and location of polling stations is extracted fromMaron (2013).Per capita measures (Panel B): Per capita estimates are computed using Voronoi polygons to compute population estimates associated with eachschool and polling station.Data sources (Panel C): Data on field size and crop cultivated were extracted from FAO (2000).Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denoted as follows:* p<0.10, ** p<0.05, *** p<0.01.
Appendix 151
Table A-3.9: Program areas vs. African Land Units – Border segment analysis (incomepotential)
Panel A1 Population Buildings Primary schools Secondary schools Students (primary) Students (secondary) Polling stationsNb in cell Nb in cell Nb per capita Nb per capita Nb per capita Nb per capita Nb per capita
Mean DV 0.992 0.00466 0.676 0.120 0.192 0.0196 0s.d. DV 0.0872 0.0609 0.440 0.306 0.370 0.131 0Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 6743 6743 6743 6743 6743 6743 6743
Panel B: High potential income ≥ 100 £/year
Panel A3 Population Buildings Primary schools Secondary schools Students (primary) Students (secondary) Polling stationsNb in cell Nb in cell Nb per capita Nb per capita Nb per capita Nb per capita Nb per capita
Mean DV 0.989 0 0.846 0.0724 0.0704 0.126 0s.d. DV 0.105 0 0.345 0.239 0.245 0.300 0Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 2817 2817 2817 2817 2817 2817 2817
This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomial function ofthe latitude and the longitude of cell centroids). The coefficient RD Estimate capture the difference between the program areas (treatment) andthe former African Land Units (control). A positive coefficient on RD Estimate indicates that, for the variable considered, the difference at thediscontinuity is positive, and hence higher in the program areas than in the former African Land Units.Data sources : Population data and building data (2015) are extracted from CIESIN (2016). Number of schools and number of students areextracted from the 2007 school census (Ministry of Education, 2009). The number and location of polling stations is extracted from Maron(2013). Per capita estimates are computed using Voronoi polygons to compute population estimates associated with each school and pollingstation.Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denoted as follows:* p<0.10, ** p<0.05, *** p<0.01.
152 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
Table A-3.10: Non-program European settler areas vs. African Land Units – Altitude &pre-treatment characteristics
(1) (2) (3) (4) (5) (6) (7) (8)Outcomes Pop. places 1955 Facility 1955 Market 1955 Health center 1955 School 1955 Well 1955 Altitude (m) Altitude (m)
Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell) Average (in cell) Average (in cell)Data-driven bandwidth
This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomial function of the latitude and the longitude ofcell centroids). The coefficient RD Estimate capture the difference between the non-program European settler area (counterfactual) and the former African Land Units (control).A positive coefficient on RD Estimate indicates that, for the variable considered, the difference at the discontinuity is positive, and hence higher in the non-program areaEuropean settler area than in the former African Land Units.Data source: Data on altitude extracted from Regional Centre For Mapping Resource For Development (2020). Other variables were obtained from the Kenya Gazetteers (U.S.Board on Geographic Names, 2018).Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
Table A-3.11: Non-program European settler areas vs. African Land Units – Short runoutcomes
(1) (2) (3) (4) (5) (6)Outcomes Pop. places 1964 Facility 1964 Market 1964 Health center 1964 School 1964 Well 1964
Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell)
Mean DV 0.00130 0.000395 0.000325 0.00000692 0.00109 0.0000208s.d. DV 0.0452 0.0205 0.0184 0.00263 0.0347 0.00456Polynomial 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5Eff. N 86013 86013 86013 86013 86013 86013
Outcomes Pop. places 1978 Facility 1978 Market 1978 Health center 1978 School 1978 Well 1978Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell) Number (in cell)
Mean DV 0.00136 0.000575 0.000395 0.00000692 0.00135 0.0000208s.d. DV 0.0460 0.0248 0.0199 0.00263 0.0398 0.00456Polynomial 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5Eff. N 86013 86013 86013 86013 86013 86013
This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomialfunction of the latitude and the longitude of cell centroids). The coefficient RD Estimate capture the difference between the non-program European settler area (counterfactual) and the former African Land Units (control). A positive coefficient on RD Estimateindicates that, for the variable considered, the difference at the discontinuity is positive, and hence higher in the non-program areaEuropean settler area than in the former African Land Units.Data souce: All variables were obtained from the Kenya Gazetteers (U.S. Board on Geographic Names, 2018).Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denotedas follows: * p<0.10, ** p<0.05, *** p<0.01.
Appendix 153
Table A-3.12: Non-program European settler area vs. African Land Units – Long run outcomemeasures
(1) (2) (3) (4) (5) (6) (7)Panel A Population Buildings Primary schools Secondary schools Students (primary) Students (secondary) Polling stations
Nb in cell Nb in cell Nb in cell Nb in cell Nb in cell Nb in cell Nb in cellRD Estimate -2.111*** -0.160*** -0.00513** -0.00210* -1.714* -0.367 -0.00455**
Mean DV 4.969 0.667 0.0185 0.00451 5.905 0.874 0.0103s.d. DV 5.089 0.471 0.158 0.0741 61.69 20.03 0.163Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 86013 86013 86013 86013 86013 86013 86013
Panel B Primary schools Secondary schools Students (primary) Students (secondary) Polling stationsNb per capita Nb per capita Nb per capita Nb per capita Nb per capita
Mean DV 0.886 0.00249 0.658 0.115 0.110 0.118 0.0430s.d. DV 0.318 0.0433 0.458 0.304 0.297 0.314 0.196Polynomial 1 1 1 1 1 1 1Bandwidth 5 5 5 5 5 5 5Eff. N 86013 86013 86013 86013 86013 86013 86013
This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomial function of thelatitude and the longitude of cell centroids). The coefficient RD Estimate capture the difference between the non-program European settler area(counterfactual) and the former African Land Units (control). A positive coefficient on RD Estimate indicates that, for the variable considered,the difference at the discontinuity is positive, and hence higher in the non-program area European settler area than in the former African LandUnits.Data sources (Panels A & B): Population data and building data (2015) are extracted from CIESIN (2016). Number of schools and number ofstudents are extracted from the 2007 school census (Ministry of Education, 2009). The number and location of polling stations is extracted fromMaron (2013).Per capita measures (Panel B): Per capita estimates are computed using Voronoi polygons to compute population estimates associated with eachschool and polling station.Data sources (Panel C): Data on field size and crop cultivated were extracted from FAO (2000).Standard errors are estimated using a heteroskedasticity-robust nearest neighbor variance estimator. Significance levels are denoted as follows:* p<0.10, ** p<0.05, *** p<0.01.
154 Ethnic homogenization and public goods: Evidence from Kenya’s land reform program
Table A-3.13: Robustness – Flexible bandwidth selection, correction of standard errors
(1) (2) (3) (4) (5) (6) (7)Outcome Population Buildings Primary schools Secondary schools Students (primary) Students (secondary) Polling stations
Nb in cell Nb in cell Nb in cell Nb in cell Nb in cell Nb in cell Nb in cell
Panel A Border schemes/other parts European settler area
Mean DV 6.981 0.830 0.0265 0.00773 8.866 1.575 0.0150s.d. DV 5.044 0.376 0.172 0.0921 71.42 25.28 0.217Polynomial 1 1 1 1 1 1 1Bandwidth 3.312 3.767 2.263 4.773 1.560 2.741 2.881Eff. N 5618 6648 3440 9164 2180 4379 4666
This table reports the results from a regression discontinuity specification. Location controls are included (third-order polynomial function of the latitude and thelongitude of cell centroids). In Panel A, the coefficient RD Estimate capture the difference between the program areas (treatment) and the neighboring areas in theformer European settler area (counterfactual). A positive coefficient on RD Estimate indicates that, for the variable considered, the difference at the discontinuityis positive, and hence higher in the program areas than in the counterfactual areas. In Panel B, the coefficient RD Estimate capture the difference between theprogram areas (treatment) and the former African Land Units (control). A positive coefficient on RD Estimate indicates that, for the variable considered, thedifference at the discontinuity is positive, and hence higher in the program areas than in the former African Land Units.Data sources : Population data and building data (2015) are extracted from CIESIN (2016). Number of schools and number of students are extracted from the 2007school census (Ministry of Education, 2009). The number and location of polling stations is extracted from Maron (2013).
CONCLUSION
What makes findings surprising? Evidence from a PhD dissertation.
In this conclusion, I test whether the main findings of this PhD are surprising. Using a novel
dataset on researchers’ priors, I compare the findings of this PhD to researchers’ priors to assess
the degree of unexpectedness of the findings. I develop a model in which researchers’ priors
can be informed by their own experiences (anecdote-based priors), by their knowledge of the
literature (evidence-based priors), or not informed at all (uninformed priors). The results suggest
that whether findings are surprising depends on the distribution of researchers’ prior types.
Priors were measured based using seminar questions and research discussions of a represen-
tative sample of the researchers that attend the same seminars and conferences as me. This
measure is likely to suffer from severe measurement bias that should be attributed, for the
most part, to recall bias. This PhD provided the ideal setting for a study of researchers’ pri-
ors, as the topics it engages with helped make researchers’ priors observable. First, everybody
has some experience (direct or indirect) of marriage or divorce.68 During private discussions
researchers made their priors explicit. These priors were likely to be based on their personal
experiences. Second, there seems to be a very strong association between Africa and ethnicity
among researchers outside of development economics, so the topic of ethnicity comes up rather
often during seminars and conferences. (The strength of this association could be tested using
an Implicit Association Test.) Throughout this conclusion, I use the words ”surprising” and
”unexpected” interchangeably.
Chapter 1 concludes that i) interethnic marriages are far from rare and ii) interethnic marriages
have become more common. This finding runs counter to what can be derived from most of
the economic literature on ethnicity in Africa. The average economist appears surprised by the
68It was hence easy to interview a very diverse pool of respondents as respondents were often interested in theinterview topics and felt confident that they were knowledgeable about these topics. One exception was older menin polygamous marriages.
156 Conclusion
magnitude of the effect (1 out of 5 women is in an interethnic marriage), thus indicating that it
is indeed an unexpected finding.
Chapter 2 concludes that children’s likelihood to have ever attended school is not negatively
affected by their parents’ divorce. Is this a surprising finding? On the one hand, the literature
on divorces tends to find negative associations of divorces with educational outcomes. On the
other hand, the outcome we study is a basic investment in education, that we would not expect
to be affected by divorces in developed settings. Ultimately, whether this finding is considered
to be surprising depends on individuals’ priors on divorce, conflict, and education.
Chapter 3 tentatively concludes that ethnic homogenization does not result in more public
good provision if the homogenization process also disrupts (co-ethnic) social networks. This
finding is a recent development of the paper, hence data on priors is too limited to conclude on
the unexpectedness of this finding. One could nevertheless argue that documenting the drastic
change in ethnic diversity after the program is an interesting contribution.
Working with surprising results: A guide
This PhD explores three research questions whose answers turned out to be rather unexpected.
This PhD has hence been an intense experience in abduction economics (Heckman and Singer,
2017).69 One of the great joys of this PhD has been to tackle uninformed priors (or prenotions
(Durkheim, 1987)). One of the worst research-related fear that I experienced during this PhD
what wondering whether the “surprising findings” reflected the state of the world or errors.70
Perhaps unsurprisingly, this combination resulted in another great joy of this PhD: engaging
with the topic of research integrity and replicability.
As a word of conclusion regarding this dissertation, I wish to share a story regarding measure-
ment error (as well as, of course, marriage): I was in Nakuru, Kenya, when the 2019 census was
conducted. I thus had to be, as per the guidelines of the Kenyan National Bureau of Statistics
(KNBS), counted. Having worked on marriages, divorces, ethnicity, and land, it was now my
turn to answer questions on these topics. Racked with guilt over the idea of making a false
statement in the census, I told the census enumerator that I was not married. That might have
been an error. I then answered the other census questions while the enumerator was trying to
69“Abduction is the process of generating and revising models, hypotheses and data analyzed in response tosurprising findings.” (Heckman and Singer, 2017). The authors advocate abduction as a strategy for reacting tosurprise.
70PEBCAK errors (Problem Exists Between Chair And Keyboard) are one of the most common sources of errors.When facing a coding error, I find great comfort in Nick Eubank’s words: “It’s natural to think that the reason wefind problems in the code behind published papers is carelessness or inattention on behalf of authors, and that thekey to minimizing problems in our code is to be more careful. The truth, I have come to believe, is more subtle:humans are effectively incapable of writing error-free code.”(Eubank, 2016)
Conclusion 157
convince me of his interest in getting to know me. That effort came at the expense of this work
as an enumerator: I noticed that most of my answers were not recorded accurately. Apologies
to anyone who comes across a weird outlier in central Kenya.
REFERENCES
Ahlerup, P. and Olsson, O., 2012. “The roots of ethnic diversity.” Journal of Economic Growth17(2), 71–102.
Akresh, R., De Walque, D., and Kazianga, H., 2016. Evidence from a randomized evaluation of thehousehold welfare impacts of conditional and unconditional cash transfers given to mothers or fathers.The World Bank.
Alesina, A., Baqir, R., and Easterly, W., 1999. “Public goods and ethnic divisions.” QuarterlyJournal of Economics 114(4), 1243–1284.
Alesina, A., Devleeschauwer, A., Easterly, W., Kurlat, S., and Wacziarg, R., 2003. “Fractional-ization.” Journal of Economic growth 8(2), 155–194.
Alesina, A. and La Ferrara, E., 2000. “Participation in heterogeneous communities.” The quar-terly journal of economics 115(3), 847–904.
—, 2005. “Ethnic diversity and economic performance.” Journal of economic literature 43(3),762–800.
Alesina, A., Michalopoulos, S., and Papaioannou, E., 2016. “Ethnic Inequality.” Journal ofPolitical Economy 124(2), 428–488.
Amato, P. R., 2000. “The Consequences of Divorce for Adults and Children.” Journal of Marriageand Family 62(4), 1269–1287.
Ambler, C. H., 1988. Kenyan communities in the age of imperialism : the central region in the latenineteenth century. Yale historical publications ; v.136, New Haven ; London: Yale UniversityPress, x, 181 pages.
Ambwere, S., 2003. Policy Implications of Land Subdivision in Settlement Areas: A Case Study ofLumakanda Settlement Scheme. Thesis.
Anderson, S. and Bidner, C., 2021. “Family Institutions.” Prepared for the Handbook of FamilyEconomics.
Andre, P. and Demonsant, J.-L., 2014. “Substitution between Formal and Qur’Anic Schools inSenegal.” The Review of Faith and International Affairs 12(2), 61–65.
Baland, J.-M., Bonjean, I., Guirkinger, C., and Ziparo, R., 2016. “The economic consequencesof mutual help in extended families.” Journal of Development Economics 123, 38–56.
159
160 References
Baldwin, K. and Huber, J. D., 2010. “Economic versus cultural differences: Forms of ethnicdiversity and public goods provision.” American Political Science Review 104(4), 644–662.
Bazzi, S., Gaduh, A., Rothenberg, A. D., and Wong, M., 2019. “Unity in Diversity? HowIntergroup Contact Can Foster Nation Building.” American Economic Review 109(11), 3978–4025.
Beck, S., Vreyer, P. D., Lambert, S., Marazyan, K., and Safir, A., 2015. “Child fostering inSenegal.” Journal of Comparative Family Studies 46(1), 57–73.
Becker, G. S., 1973. “A theory of marriage: Part I.” Journal of Political economy 81(4), 813–846.
Becker, S. O., Grosfeld, I., Grosjean, P., Voigtlander, N., and Zhuravskaya, E., 2020. “Forcedmigration and human capital: Evidence from post-WWII population transfers.” AmericanEconomic Review 110(5), 1430–63.
Beegle, K., De Weerdt, J., and Dercon, S., 2006. “Orphanhood and the Long-Run Impact onChildren.” American Journal of Agricultural Economics 88(5), 1266–1272.
—, 2010. “Orphanhood and human capital destruction: Is there persistence into adulthood?”Demography 47(1), 163–180.
Berge, L. I. O., Bjorvatn, K., Galle, S., Miguel, E., Posner, D. N., Tungodden, B., and Zhang,K., 2018. “Ethnically Biased? Experimental Evidence from Kenya.” Journal of the EuropeanEconomic Association .
Bertrand-Dansereau, A. and Clark, S., 2016. “Pragmatic tradition or romantic aspiration? Thecauses of impulsive marriage and early divorce among women in rural Malawi.” Demo-graphic Research 35, 47–80.
Bisin, A. and Verdier, T., 2000. ““Beyond the melting pot”: cultural transmission, marriage,and the evolution of ethnic and religious traits.” The Quarterly Journal of Economics 115(3),955–988.
Bjorklund, A. and Sundstrom, M., 2006. “Parental separation and children’s educational at-tainment: A siblings analysis on Swedish register data.” Economica 73(292), 605–624.
Boesen, J., 2019. Tanzania: from ujamaa to villagization. University of Toronto Press.
Boone, C., Lukalo, F., and Joireman, S. F., 2021. “Promised Land: Settlement Schemes inKenya, 1962 to 2016.” Political Geography 89, 102393.
Boubacar, N. and Francois, R., 2007. Senegal, Country case study. Country Profile commissionedfor the EFA Global Monitoring Report 2007, Strong foundations: early childhood care andeducation.
Bratberg, E., Rieck, K. M. E., and Vaage, K., 2014. “Intergenerational earnings mobility anddivorce.” Journal of Population Economics 27(4), 1107–1126.
Burgess, R., Jedwab, R., Miguel, E., Morjaria, A., and Padro i Miquel, G., 2015. “The valueof democracy: evidence from road building in Kenya.” American Economic Review 105(6),1817–51.
Calonico, S., Cattaneo, M. D., Farrell, M. H., and Titiunik, R., 2017. “Rdrobust: Software forRegression-discontinuity Designs.” The Stata Journal 17(2), 372–404.
Canut, C., 2002. “Langues et filiation en Afrique.” Les Temps Modernes (4), 410–440.
References 161
Case, A. and Ardington, C., 2006. “The impact of parental death on school outcomes: Longi-tudinal evidence from South Africa.” Demography 43(3), 401–420.
Chae, S., 2016. “Parental divorce and children’s schooling in rural Malawi.” Demography 53(6),1743–1770.
Chandra, K., 2006. “What is ethnic identity and does it matter?” Annu. Rev. Polit. Sci. 9,397–424.
Charnysh, V. and Peisakhin, L., 2021. “The Role of Communities in the Transmission of Polit-ical Values: Evidence from Forced Population Transfers.” British Journal of Political Science ,1–21.
Chehami, J., 2016. “Les familles et le daara au Senegal.” Afrique contemporaine (1), 77–89.
Churchill, S. A. and Smyth, R., 2017. “Ethnic diversity and poverty.” World Development 95,285–302.
Cisse, F., Daffe, G., and Diagne, A., 2004. “Les inegalites dans l’acces a l’education au Senegal.”Revue d’economie du developpement 12(2), 107–122.
Clark, S. and Brauner-Otto, S., 2015. “Divorce in sub-Saharan Africa: Are Unions BecomingLess Stable?” Population and Development Review 41(4), 583–605.
Clark, S. and Hamplova, D., 2013. “Single motherhood and child mortality in sub-SaharanAfrica: A life course perspective.” Demography 50(5), 1521–1549.
Clark, S., Kabiru, C., and Mathur, R., 2010. “Relationship transitions among youth in urbanKenya.” Journal of Marriage and Family 72(1), 73–88.
Conversi, D., 2010. “Cultural homogenization, ethnic cleansing, and genocide.” In “OxfordResearch Encyclopedia of International Studies,” .
Crespin-Boucaud, J., 2020. “Interethnic and interfaith marriages in sub-Saharan Africa.” WorldDevelopment 125, 104668.
Crespin-Boucaud, J. and Hotte, R., 2021. “Parental divorces and children’s educational out-comes in Senegal.” World Development 145, 105483.
de la Cuesta, B. and Wantchekon, L., 2016. “Is Language Destiny? The Origins and Conse-quences of Ethnolinguistic Diversity in Sub-Saharan Africa.” In “The Palgrave Handbook ofEconomics and Language,” Springer, pages 513–537.
De Vreyer, P., Lambert, S., Safir, A., and Sylla, M., 2008. “Pauvrete et structure familiale,pourquoi une nouvelle enquete.” Stateco (102), 261–275.
Department of Settlement, 1974. Annual Report 1974. Department of Settlement.
—, various. Annual Reports. Department of Settlement.
Desmet, K., Ortuno-Ortın, I., and Wacziarg, R., 2016. “Linguistic cleavages and economicdevelopment.” In “The Palgrave handbook of economics and language,” Springer, pages425–446.
Di Matteo, F., 2019. Decolonising Property in Kenya?: Tracing Policy Processes of Kenyan Contem-porary Land Reform (1990s-2016). A Study of the Politicization of Decision-Making in HistoricalPerspective. Ph.D. thesis, Paris, EHESS.
162 References
Dial, F. B., 2008. Mariage et divorce a Dakar: itineraires feminins. KARTHALA Editions.
Djuikom, M. A. and van de Walle, D. P., 2018. “Marital Shocks and Women’s Welfare inAfrica.” World Bank Policy Research Working Paper (8306).
Doss, C., 2013. “Intrahousehold bargaining and resource allocation in developing countries.”The World Bank Research Observer 28(1), 52–78.
Dulani, B., Harris, A. S., Horowitz, J., and Kayuni, H., 2021. “Electoral preferences amongmultiethnic voters in Africa.” Comparative Political Studies 54(2), 280–311.
Dumas, C. and Lambert, S., 2011. “Educational Achievement and Socio-economic Background:Causality and Mechanisms in Senegal.” Journal of African Economies 20(1), 1–26.
Dupuy, A. and Galichon, A., 2014. “Personality traits and the marriage market.” Journal ofPolitical Economy 122(6), 1271–1319.
Durkheim, E., 1987. “Les regles de la methode sociologique (1895).” Paris, puf .
Easterly, W. and Levine, R., 1997a. “Africa’s growth tragedy: policies and ethnic divisions.”The quarterly journal of economics 112(4), 1203–1250.
—, 1997b. “Africa’s growth tragedy: policies and ethnic divisions.” Quarterly Journal of Eco-nomics 112(4), 1203–1250.
Eifert, B., Miguel, E., and Posner, D. N., 2010. “Political competition and ethnic identificationin Africa.” American Journal of Political Science 54(2), 494–510.
Ermisch, J. F. and Francesconi, M., 2001. “Family structure and children’s achievements.”Journal of population economics 14(2), 249–270.
Eshiwani, G. S., 1990. Implementing Educational Policies in Kenya. World Bank Discussion PapersNo. 85, Washington, D. C.: World Bank.
Eubank, N., 2016. “Embrace Your Fallibility:Thoughts on Code Integrity.”https://www.nickeubank.com/wp-content/uploads/2016/06/Eubank_EmbraceYourFallibility.pdf. Accessed: 2021-07-14.
Eubank, N. et al., 2019. “Social networks and the political salience of ethnicity.” QuarterlyJournal of Political Science 14(1), 1–39.
Eynde vanden, O., Kuhn, P. M., and Moradi, A., 2018. “Trickle-Down Ethnic Politics: Drunkand Absent in the Kenya Police Force (1957-1970).” American Economic Journal: EconomicPolicy 10(3), 388–417.
Fafchamps, M. and Quisumbing, A. R., 2007. “Household formation and marriage markets inrural areas.” Handbook of development economics 4, 3187–3247.
FAO, 2000. “Agricultural Fields: Kenya, 2000.”
FAO/IIASA, 2011. “Global Agro-ecological Zones (GAEZ v3.0).” FAO Rome, Italy and IIASA,Laxenburg, Austria.
Fearon, J. D., 2003. “Ethnic and cultural diversity by country.” Journal of economic growth 8(2),195–222.
Fearon, J. D. and Laitin, D. D., 1996. “Explaining interethnic cooperation.” American PoliticalScience Review 90(4), 715–735.
Foundation, O., 2020. “Global Agro-ecological Zones (GAEZ v3.0).”
Francesconi, M., Jenkins, S. P., and Siedler, T., 2010. “Childhood family structure and school-ing outcomes: evidence for Germany.” Journal of Population Economics 23(3), 1073–1103.
Francois, P., Rainer, I., and Trebbi, F., 2015. “How is power shared in Africa?” Econometrica83(2), 465–503.
Fryer Jr, R. G., 2007. “Guess who’s been coming to dinner? Trends in interracial marriage overthe 20th century.” Journal of Economic Perspectives 21(2), 71–90.
Furtado, D. and Theodoropoulos, N., 2011. “Interethnic marriage: a choice between ethnicand educational similarities.” Journal of Population Economics 24(4), 1257–1279.
Gelman, A. and Imbens, G., 2019. “Why high-order polynomials should not be used in regres-sion discontinuity designs.” Journal of Business & Economic Statistics 37(3), 447–456.
Gershman, B. and Rivera, D., 2018. “Subnational diversity in Sub-Saharan Africa: Insightsfrom a new dataset.” Journal of Development Economics 133, 231–263.
Gibson, J., Olivia, S., and Boe-Gibson, G., 2020. “Night lights in economics: Sources anduses.” Etudes et Documents,n°1, CERDI .
Gisselquist, R. M., Leiderer, S., and Nino-Zarazua, M., 2016. “Ethnic heterogeneity and publicgoods provision in Zambia: Evidence of a subnational “diversity dividend”.” World Devel-opment 78, 308–323.
Glewwe, P. and Kremer, M., 2006. “Schools, teachers, and education outcomes in developingcountries.” Handbook of the Economics of Education 2, 945–1017.
Gnoumou Thiombiano, B., LeGrand, T. K., and Kobiane, J.-F., 2013. “Effects of Parental UnionDissolution on Child Mortality and Child Schooling in Burkina Faso.” Demographic Research29, 797–816.
Goren, E., 2014. “How ethnic diversity affects economic growth.” World Development 59, 275–297.
Greenberg, J. H., 1956. “The measurement of linguistic diversity.” Language 32(1), 109–115.
Habyarimana, J., Humphreys, M., Posner, D. N., and Weinstein, J. M., 2007. “Why does ethnicdiversity undermine public goods provision?” American Political Science Review 101(4), 709–725.
Hanlon, J., 1990. Mozambique : the revolution under fire. London ; Atlantic Highlands, N.J.: ZedBooks.
Harris, J. A. and Posner, D. N., 2019. “(Under what conditions) Do politicians reward theirsupporters? Evidence from Kenya’s constituencies development fund.” American PoliticalScience Review 113(1), 123–139.
Hassan, M., 2017. “The Strategic Shuffle: Ethnic Geography, the Internal Security Apparatus,and Elections in Kenya.” American Journal of Political Science 61(2), 382–395.
Heckman, J. J. and Singer, B., 2017. “Abducting economics.” American Economic Review 107(5),298–302.
Hjort, J., 2014. “Ethnic Divisions and Production in Firms *.” The Quarterly Journal of Economics129(4), 1899–1946.
Horowitz, J., 2019. “Ethnicity and the Swing Vote in Africa’s Emerging Democracies: Evidencefrom Kenya.” British Journal of Political Science 49(3), 901–921.
164 References
Hotte, R. and Marazyan, K., 2020. “Demand for insurance and within-kin-group marriages:Evidence from a West-African country.” Journal of Development Economics 146, 102489.
Jedwab, R., Kerby, E., and Moradi, A., 2017. “History, Path Dependence and Development:Evidence from Colonial Railroads, Settlers and Cities in Kenya.” Economic Journal , 1467–1494.
Jones, S., Schipper, Y., Ruto, S., and Rajani, R., 2014. “Can your child read and count? Mea-suring learning outcomes in East Africa.” Journal of African economies 23(5), 643–672.
Kalmijn, M., 1998. “Intermarriage and homogamy: Causes, patterns, trends.” Annual review ofsociology 24(1), 395–421.
Kalmijn, M. and Van Tubergen, F., 2006. “Ethnic intermarriage in the Netherlands: Confirma-tions and refutations of accepted insights.” European Journal of Population/Revue europeenne dedemographie 22(4), 371–397.
Kanbur, R., Rajaram, P. K., and Varshney, A., 2011. “Ethnic diversity and ethnic strife. Aninterdisciplinary perspective.” World Development 39(2), 147–158.
Kenya, 1971. An Economic Appraisal of the Settlement Schemes 1964/65 - 1967/68. Statistics Divi-sion and Ministry of Finance and Economic Planning.
—, 1980. Educational Trends 1973-77. Nairobi: Central Bureau of Statistics Ministry of EconomicPlanning and Community Affairs.
Kenya National Bureau of Statistics, 1926. “Colony & Protectorate of Kenya: Plans ShowingAdministrative Boundaries.” Report, Nairobi.
—, 2015. “Kenya Demographic and Health Survey 2014.” Report.
Kenya National Bureau of Statistics, 2015/2016. “Kenya Integrated Household Budget Sur-vey.”
Kramon, E. and Posner, D. N., 2016. “Ethnic Favoritism in Education in Kenya.” QuarterlyJournal of Political Science 11(1), 1–58.
Lagoutte, S., Bengaly, A., Youra, B., Fall, P. T., and Danish Institute for Human Rights, 2014.Rupture du lien matrimonial, pluralisme juridique et droits des femmes en Afrique de l’Ouest franco-phone. Copenhagen: Danish Institute for Human Rights. OCLC: 900293711.
Lambert, S., van de Walle, D., and Villar, P., 2019. Towards Gender Equity in Development, chap.Marital trajectories, women’s autonomy and women’s wellbeing in Senegal. Oxford: OxfordUniversity Press.
Le Forner, H., 2020. “Age At Parents’ Separation and Children Achievement: Evidence FromFrance Using a Sibling Approach.” Annals of Economics and Statistics (forthcoming).
Leo, C., 1984. Land and class in Kenya. Political economy of world poverty, Toronto: Univ. ofToronto P., 244 s. pages.
Leys, C., 1974. “Interpreting African Underdevelopment: Reflections on the ILO Report onEmployment, Incomes and Equality in Kenya.” Manpower and Unemployment Research inAfrica , 19–28.
—, 1975. Underdevelopment in Kenya: the political economy of neo-colonialism, 1964-1971. Berkeley:University of California Press.
References 165
Locoh, T. and Thiriat, M.-P., 1995. “Divorce et remariage des femmes en Afrique de l’Ouest.Le cas du Togo.” Population , 61–93.
Lorgen, C. C., 2000. “Villagisation in Ethiopia, Mozambique, and Tanzania.” Social Dynamics26(2), 171–198.
Lowes, S., Nunn, N., Robinson, J. A., and Weigel, J., 2015. “Understanding Ethnic Identity inAfrica: Evidence from the Implicit Association Test (IAT).” American Economic Review 105(5),340–45.
Lukalo, Boone, Browne, and Joireman, 2019. “Kenya Settlement Schemes Data Project.” Lon-don, Nairobi, and Richmond: NCL, LSE, and UoR.
Lukalo, F. and Odari, S., 2016. “Exploring the Status of Settlement Schemes in Kenya.”
Luke, N. and Munshi, K., 2006. “New roles for marriage in urban Africa: Kinship networksand the labor market in Kenya.” The Review of Economics and Statistics 88(2), 264–282.
Lynch, G., 2011. I Say to You: ethnic politics and the Kalenjin in Kenya. University of ChicagoPress.
Mack, R., 1970. “The great African cattle plague epidemic of the 1890’s.” Tropical Animal Healthand Production 2(4), 210–219.
Marazyan, K., 2015. “Resource Allocation in Extended Sibships: An Empirical Investigationfor Senegal.” Journal of African Economies 24(3), 416–452.
Maron, M., 2013. “Kenya Election data.”
Mayrargue, C., 2004. “Trajectoires et enjeux contemporains du pentecotisme en Afrique del’Ouest.” Critique internationale (1), 95–109.
McCauley, J. F., 2014. “The political mobilization of ethnic and religious identities in Africa.”American Political Science Review 108(4), 801–816.
Menon, N., Van Der Meulen Rodgers, Y., and Nguyen, H., 2014. “Women’s land rights andchildren’s human capital in Vietnam.” World Development 54, 18–31.
Meyer, B., 2004. “Christianity in Africa: From African independent to Pentecostal-charismaticchurches.” Annu. Rev. Anthropol. 33, 447–474.
Michalopoulos, S., 2012. “The origins of ethnolinguistic diversity.” American Economic Review102(4), 1508–39.
Miguel, E., 2004. “Tribe or nation? Nation building and public goods in Kenya versus Tanza-nia.” World politics 56(3), 327–362.
Miguel, E. and Gugerty, M. K., 2005. “Ethnic diversity, social sanctions, and public goods inKenya.” Journal of Public Economics 89(11–12), 2325–2368.
Miho, A., Jarotschkin, A., and Zhuravskaya, E., 2019. “Diffusion of Gender Norms: Evidencefrom Stalin’s Ethnic Deportations.” Available at SSRN 3417682 .
Milanovic, B., 2003. Is inequality in Africa really different? Washington, D.C.: World BankDevelopment Research Group Poverty Team, 43 pages.
Miles, W. F. and Rochefort, D. A., 1991. “Nationalism versus ethnic identity in sub-SaharanAfrica.” American Political Science Review 85(2), 393–403.
Ministry of Education, 2009. “2007 School Census.” Online Database.
166 References
Monden, C. W. and Smits, J., 2005. “Ethnic intermarriage in times of social change: The caseof Latvia.” Demography 42(2), 323–345.
Montalvo, J. G. and Reynal-Querol, M., 2005. “Ethnic polarization, potential conflict, and civilwars.” American economic review 95(3), 796–816.
Morgan, W. and Shaffer, N. M., 1966. Population of Kenya: Density and Distribution. Nairobi:Oxford University Press.
Morgan, W. T. W., 1963. “The ’White Highlands’ of Kenya.” The Geographical Journal 129(2),140–155.
Mozaffar, S., Scarritt, J. R., and Galaich, G., 2003. “Electoral institutions, ethnopolitical cleav-ages, and party systems in Africa’s emerging democracies.” American political science review97(3), 379–390.
Mwiria, K., 1991. “Education for subordination: African education in colonial Kenya.” Historyof Education 20(3), 261–273.
Medard, C. and Golaz, V., 2011. “Les frontieres interieures du Kenya : une contrainte pourl’acces a la terre.” CERISCOPE Frontieres .
Ndaruhutse, S., Branelly, L., Latham, M., and Penson, J., 2008. Grade repetition in primaryschools in Sub-Saharan Africa: an evidence base for change. CfBT Education Trust Reading, UK.
Ngau, P. M., 1987. “Tensions in Empowerment: The Experience of the ”Harambee” (Self-Help)Movement in Kenya.” Economic Development and Cultural Change 35(3), 523–538.
Nottidge, C. P. R. and Goldsack, J. R., 1966. The Million-Acre Settlement Scheme 1962-1966.Department of Settlement.
Parsons, T., 2012. “Being Kikuyu in Meru: Challenging the Tribal Geography of ColonialKenya.” The Journal of African History 53(1), 65–86.
Polian, P. M., 2004. Against their will: the history and geography of forced migrations in the USSR.Budapest; New York: Central European University Press.
Posner, D. N., 2004. “Measuring ethnic fractionalization in Africa.” American journal of politicalscience 48(4), 849–863.
—, 2005. Institutions and ethnic politics in Africa. Cambridge University Press.
Qian, Z. and Lichter, D. T., 2007. “Social boundaries and marital assimilation: Interpretingtrends in racial and ethnic intermarriage.” American Sociological Review 72(1), 68–94.
—, 2011. “Changing patterns of interracial marriage in a multiracial society.” Journal of Marriageand Family 73(5), 1065–1084.
Regional Centre For Mapping Resource For Development, 2020. “Kenya SRTM DEM 30me-ters.” http://opendata.rcmrd.org/datasets/kenya-srtm-dem-30meters.
Republic of Kenya, 1999. Totally Integrated Quality Education and Traiining TIQET - Report of theCommission of Inquiry into the Education System of Kenya. Nairobi: Republic of Kenya.
Rhode, P. W. and Strumpf, K. S., 2003. “Assessing the importance of Tiebout sorting: Localheterogeneity from 1850 to 1990.” American Economic Review 93(5), 1648–1677.
Ruthenberg, H. and fur Wirtschaftsforschung), A.-S. I.-I., 1966. African agricultural productiondevelopment policy in Kenya, 1952-1965. Berlin: Springer-Verlag.
Simons, G. F. and Fennig, C. D., 2017. “Ethnologue: Languages of the world.” SIL International20.
Simson, R., 2018. “Ethnic (in)equality in the public services of Kenya and Uganda.” AfricanAffairs 118(470), 75–100.
Simson, R. and Green, E., 2020. “Ethnic favouritism in Kenyan education reconsidered: whena picture is worth more than a thousand regressions.” The Journal of Modern African Studies58(3), 425–460.
Smith-Greenaway, E., 2020. “Does Parents’ Union Instability Disrupt Intergenerational Ad-vantage? An Analysis of Sub-Saharan Africa.” Demography , 1–29.
Sporlein, C., Schlueter, E., and van Tubergen, F., 2014. “Ethnic intermarriage in longitudinalperspective: Testing structural and cultural explanations in the United States, 1880–2011.”Social science research 43, 1–15.
Survey of Kenya, 1959. Atlas of Kenya: a comprehensive series of new and authentic maps preparedfrom the national survey and other governmental sources ; with gazetteer and notes on pronunciation& spelling. 1st ed. Nairobi: Printed by the Survey of Kenya, ix l., 44 l. of maps (part col.)pages.
Tiebout, C. M., 1956. “A pure theory of local expenditures.” Journal of political economy 64(5),416–424.
Troup, L., 1953. Inquiry into the General Economy of Farming in the Highlands having regard toCapital and Long- and Short-term Financial Commitments, whether Secured or Unsecured, excludingFarming Enterprises solely concerned with the Production of Sisal, Wattle, Tea and Coffee. Nairobi:Government Printer.
U.S. Board on Geographic Names, 2018. “Gazetteer Kenya.” http://geonames.nga.mil/gns/html/namefiles.html.
van de Walle, D., 2013. “Lasting Welfare Effects of Widowhood in Mali.” World Development51, 1–19.
Van der Gaag, J. and Adams, A., 2010. “Where Is the Learning? Measuring Schooling Effortsin Developing Countries. Policy Brief 2010-04.” Brookings Institution .
World Bank, 1973. “Agricultural Sectory Survey - Kenya.” Report 254a-KE, Eastern AfricaProjects Department.
World Resources Institute, 2007. Nature’s Benefits in Kenya, An Atlas of Ecosystems and HumanWell-Being. World Resources Institute.
Youe, C. P., 1988. “Settler Capital and the Assault on the Squatter Peasantry in Kenya’s UasinGishu District, 1942-63.” African Affairs 87(348), 393–418.