Aus dem Institut für Tierzucht und Tierhaltung der Agrar- und Ernährungswissenschaftlichen Fakultät der Christian-Albrechts-Universität zu Kiel GENETIC VARIABILITY OF EQUINE MILK PROTEIN GENES Dissertation zur Erlangung des Doktorgrades der Agrar- und Ernährungswissenschaftlichen Fakultät der Christian-Albrechts-Universität zu Kiel Vorgelegt von M. Sc. agr. JULIA ELENA MARGOT ELISABETH BRINKMANN aus Kiel. Kiel, 2015 Dekan: Prof. Dr. Eberhard Hartung 1. Berichterstatter: Prof. Dr. Georg Thaller 2. Berichterstatter: Prof. Dr. Siegfried Wolffram Tag der mündlichen Prüfung: 04. 11. 2015 Diese Dissertation wurde mit dankenswerter finanzieller Unterstützung des BMBF im Rahmen des Kompetenznetzwerkes Food Chain Plus (FoCus) angefertigt
129
Embed
GENETIC VARIABILITY OF EQUINE MILK PROTEIN … · 2017-05-03 · 3.5 Mare milk as a substitute in case of cow milk protein allergy ... The use of mare milk in human nutrition has
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Aus dem Institut für Tierzucht und Tierhaltung
der Agrar- und Ernährungswissenschaftlichen Fakultät
der Christian-Albrechts-Universität zu Kiel
GENETIC VARIABILITY OF EQUINE MILK
PROTEIN GENES
Dissertation
zur Erlangung des Doktorgrades
der Agrar- und Ernährungswissenschaftlichen Fakultät
der Christian-Albrechts-Universität zu Kiel
Vorgelegt von
M. Sc. agr.
JULIA ELENA MARGOT ELISABETH BRINKMANN
aus Kiel.
Kiel, 2015
Dekan: Prof. Dr. Eberhard Hartung
1. Berichterstatter: Prof. Dr. Georg Thaller
2. Berichterstatter: Prof. Dr. Siegfried Wolffram
Tag der mündlichen Prüfung: 04. 11. 2015
Diese Dissertation wurde mit dankenswerter finanzieller Unterstützung des BMBF im
Rahmen des Kompetenznetzwerkes Food Chain Plus (FoCus) angefertigt
Meiner Familie
TABLE OF CONTENTS
GENERAL INTRODUCTION .................................................................................................. 1
CHAPTER I:
PRODUCTION, COMPOSITION AND UTILIZATION OF MARE MILK ........................... 3
1. Production ........................................................................................................................... 4
DNA-BASED ANALYSIS OF PROTEIN VARIANTS REVEALS DIFFERENT
GENETIC VARIABILITY OF THE PARALOGOUS EQUINE ß-LACTOGLOBULIN
GENES LGB1 AND LGB2
J. Brinkmann 1, V. Jagannathan 2,3, C. Drögemüller 2, S. Rieder 3,4, T. Leeb 2, G. Thaller 1 and
J. Tetens 1
1 Institute of Animal Breeding and Husbandry, Christian-Albrechts-University Kiel, Kiel,
Germany
2 Institute of Genetics, University of Bern, Bern, Switzerland
3 Swiss Competence Center of Animal Breeding and Genetics, University of Bern, Bern
University of Applied Sciences HAFL and Agroscope, Bern, Switzerland
4 Agroscope, Swiss National Stud Farm, Avenches, Switzerland
Submitted for Publication in Livestock Science
32
Abstract
The genetic variability of milk protein genes may influence the nutritive value or processing
and functional properties of the milk. While numerous protein variants are known in ruminants,
knowledge about milk protein variability in horses is still limited. Mare’s milk is, however,
produced for human consumption in many countries. Beta-lactoglobulin belonging to the
protein family of lipocalins, which are known as common food- and airborne allergens, is a
major whey protein. It is absent from human milk and thus a key agent in provoking cow’s milk
protein allergy. Mare’s milk is, however, usually better tolerated by most affected people.
Several functions of -lactoglobulin have been discussed, but its ultimate physiological role
remains unclear. In the current study, the open reading frames of the two equine -lactoglobulin
paralogues LGB1 and LGB2 were resequenced in 249 horses belonging to 14 different breeds
in order to predict the existence of protein variants at the DNA-level. Thereby, only a single
signal peptide variant of LGB1, but 10 different putative protein variants of LGB2 were
identified. In horses, both genes are expressed and in such this is a striking previously unknown
difference in genetic variability between the two genes. It can be assumed that LGB1 is the
ancestral paralogue, which has an essential function causing a high selection pressure. As horses
have very low milk fat content this unknown function might well be related to vitamin-uptake.
Further studies are, however, needed, to elucidate the properties of the different gene products.
Keywords: horse, whey proteins, milk protein variants, -lactoglobulin
Implications
Scientific interest in mare milk arose since positive effects on human health were observed.
Furthermore, mare’s milk is discussed as a possible substitute for cow milk in case of a cow
milk protein allergy. An improved knowledge about the protein fraction of mare’s milk and
especially of the genetic structure and variability of the milk protein genes may lead to a better
understanding of these effects. The results of this study are a helpful tool for further research in
allergenicity of mare’s milk as well as well as its effects on human health.
Introduction
Whey proteins account for approximately 40% of total equine milk protein, which is
intermediate between human and bovine milk with shares of about 50% and 20%, respectively.
In most species, α-lactalbumin and β-lactoglobulin represent the major whey proteins, while
33
the latter is absent from the milk of humans, camels, lagomorphs and rodents. In horses, each
of the two proteins accounts for ~30% of the whey protein fraction (Uniacke-Lowe et al., 2010).
Beta-lactoblobulin belongs to the ligand-binding protein family of lipocalins, which are known
to be major food and airborne allergens (Mäntyjärvi et al., 2000). Beta-lactoglobulin generally
binds retinol and in many species also fatty acids, but not in horse or pig (Pérez and Calvo,
1995). Furthermore, roles as signalling molecule or activity-modulator have been discussed
(Kontopidis et al., 2004). However, no definite physiological function of the protein has been
determined until today.
Two fractions of equine ß-lactoglobulin, ß-lactoglobulin I and ß-lactoglobulin II, have been
identified arising from the presence of two paralogous genes. This is also the case in other
species such as donkey and dog, while in cats e.g. even three paralogues are present (Halliday
et al., 1993; Godovac-Zimmermann et al., 1990).
Beta-lactoglobulin is known to be a major allergen provoking cow milk protein allergy (CMA),
an IgE mediated allergenic reaction causing a broad range of symptoms, such as atopic
dermatitis, constipation and infantile colic. This condition affects approximately 2% of infants
when nourished with milk replacements on cow milk basis. In these cases, mare`s milk can be
regarded as a possible substitute, which is better tolerated by most of the affected children
(Businco et al., 2000; Curadi et al., 2001). Moreover, positive effects of mare’s milk
consumption on diseases like atopic dermatitis (Foekel et al., 2009), Morbus Crohn (Schubert
et al., 2009) or cardiovascular diseases (Chen et al., 2010) have been reported.
There are many studies about whey protein variability in cattle and other species (Caroli et al.,
2009; Selvaggi et al., 2014), and also in the donkey different genetic variants of whey proteins
have been reported (Herrouin et al., 2000; Cunsolo et al., 2007; Chianese et al., 2013), but the
knowledge about equine whey proteins genetic variability is limited. However, the presence of
different genetic variants might alter the allergenicity, but also other properties such as the
nutritive value of the milk. Furthermore, milk protein variants are valuable tools for breed
characterizations, biodiversity investigations, and evolutionary studies (Caroli et al., 2009). The
major aim of the current study was therefore to identify genetic -lactoglobulin variants in the
domestic horse. Furthermore, a hypothesis regarding the evolution of different variants was
established.
34
Material and methods
Animals and samples
Genomic DNA was extracted from full blood and hair samples of 198 horses from 8 different
breeds that are actually used for mare’s milk production in Germany applying a modified
protocol according to Miller et al. (1988). The animals were selected to be as unrelated as
possible. Additionally, individual whole genome sequence variant calling data of a total 51
horses of 10 different breeds available from other studies were incorporated in the analyses (for
details on individual coverage see Supplemental Table 1). Bioinformatic details were reported
before (Drögemüller et al., 2014; Frischknecht et al., 2014). In total, 249 horses belonging to
14 different breeds or populations were analyzed (Table 1).
Table 1 Animals used in the sequencing of equine LGB1 and LGB2 (n=249)
Breed Acronym Sanger1 [N] WGS2 [N] Total [N]
Akhal-Teke AK 1 1
Dairy Crossbreed3 CB 21 21
Argentine Criollo Horse CR 27 27
Fjord Horse FJ 3 3
Franches-Montagnes FM 26 26
Haflinger HF 39 1 40
Icelandic Horse IC 25 1 26
Dutch Warmblood (KWPN) WBNL 1 1
Quarter Horse QH 22 3 25
Russian Heavy Draft RU 24 24
Shetlandpony SP 2 2
Swiss Warmblood WBCH 2 2
UK Warmblood WBUK 2 2
German Warmblood WBD 37 12 47
Total 198 51 249
1 Sequencing data from individual Sanger resequencing of open reading frames 2 Sequencing data from Illumina HiSeq whole genome sequencing (WGS) 3 Crossbreed mainly composed of German Riding Pony, Haflinger Horse, Connemara Pony and New Forrest
Pony that has been intuitively bred for higher milk yield
DNA sequencing
A total of 12 Primer pairs (Supplemental Table 2) were designed to amplify all exons
contributing to the open reading frames of the genes and adjacent intronic regions using the
35
Primer 3 software (Rozen and Skaletsky, 2000) based on the genomic reference sequences of
both lactoglobulin genes (Acc. No NC_009168.2).
PCR amplification and DNA sequencing were done as described by Gallinat et al. (2013). The
obtained sequences were analyzed and compared with the genomic reference sequence (Acc.
No. NC_009168.2) using the software Sequencher 4.9 (Gene Codes Corp., Ann Arbor, MI).
Results
The open reading frames of equine LGB1 and LGB2 were successfully sequenced in 223 horses
each (Table 2). The analysis revealed a previously unknown signal peptide variant of the LGB1
gene as well as 10 non-synonymous variants of the LGB2 gene, 8 of which were considered
novel. A preliminary nomenclature was established for these variants. The counted allele
frequencies of the variants of LGB1 and LGB2 in samples of > 15 animals per breed are given
in Table 2.
36
Table 2 Number of successfully sequenced animals per breed and counted allele frequencies for LGB1 and LGB2 variants
1 For an explanation of breed acronyms see Table 1 2 A plus sign indicates that the correspondent variant is present in, but the number of animals is too low to determine allele frequencies (N3) 3 A minus sign indicates that the correspondent variant is not present in that breed 4 Breeds belonging to the European Warmblood population (Dutch, Swiss, UK and German Warmblood) were also analysed jointly as WBTOTAL
37
LGB1
In most of the analyzed animals no differences to the genomic reference sequence
(NC_009168.2:38250776-38255515) were found. Only one Quarter Horse as well as the Akhal-
Teke were found to be heterozygous for a previously unknown A>G transition at position 37
of the open reading frame leading to a predicted amino acid exchange from methionine to valine
at position 13 of the signal peptide.
LGB2
Eight nonsynonymous mutations were identified within the LGB2 gene, 5 of which were
previously undescribed. Based on the observed genotypes, the presence of 10 distinct protein
variants was predicted. Preliminary designations (LGB2*A – LGB2*G) were assigned to these
variants, which will be used throughout the following sections; for details see Table 3 and
Figure 1.
The most common and probably ancestral (see below) LGB2 variant, which was thus designated
LGB*A did occur in all breeds except for Fjord Horses and the single Akhal-Teke. It differs
from the genomic GenBank reference sequence (NC_009168.2:38266531-38271345) in a
single position corresponding the 164th nucleotide of the open reading frame leading to the
presence of alanine in position 55 of the protein (Table 3 and Figure 1). In this position, the
reference sequence NC_009168.2 codes for valine; the variant was termed LGB*B1. It was
present in most of the examined breed samples except for the crossbred ponies, Fjord Horses,
Icelandic Horses, Russian Heavy Drafts, Shetlandponies and UK-Warmbloods. The presence
of an additional transversion (c.394 G>T; Ala132Ser) that was found in Akhal-Teke, Criollo
Horses, Haflinger Horses, Icelandic Horses and Quarter Horses leads to variant LGB2*B2. In
most breeds but Fjord Horses and Dutch Warmblood, a variant denoted as LGB2*C1 was
identified that differs from variant A by an additional nonsynonymous transition at position 230
of the coding sequence (p.Arg77His). A further nucleotide exchange leading to a predicted
replacement of alanine by threonine on position 83 of the protein, which was only found in the
Icelandic Horse, differentiates LGB*C2 from that variant. Only in Criollos, Franches-
Montagnes and Icelandic Horses, variant LGB2*D1 was found, which differs from variant A
by the presence of an additional mutation in position 157 of the open reading frame (c.157
G>A; p.Glu53Lys), also present in the mRNA reference sequence NM_001082494). A further
nonsynonymous exchange (c.515 G>C; p.Pro172Arg) leads to variant LGB2*D2, which
completely corresponds to the mRNA reference sequence (NM_001082494).This variant was
38
found at low frequencies in most of the breeds except for Akhal-Teke, Fjord Horses, Dutch
Warmblood, Shetlandponies, Swiss Warmblood, and UK-Warmblood. Particularly in the
crossbreed, but also in Haflinger, Icelandic Horses and Quarter Horses, variant LGB2*E was
found. This variant shows a nonsynonymous transversion in position 520 of the coding
sequence (c.520 G>C; p.Gly174Arg) as compared to variant A. The same mutation but in
conjunction with the mutation defining LGB2*B1 characterizes LGB2*G, which most likely
arose by recombination and was only detected in the crossbred ponies. Very rare and only found
in the German Warmblood Horse was variant LGB2*F, which differs from variant A by a
transversion from A to T in position 70 of the open reading frame leading to a predicted
exchange of threonin for serin in codon 24.
Table 3 Sequence variation and resulting amino acid substitutions for LGB2 variants
Position1
LGB2 variant
A B12 B2 C1 C2 D1 D23 E F G4
70
24
ACG
Thr
TCG
Ser
157
53
GAG
Glu
AAG
Lys
AAG
Lys
164
55
GCC
Ala
GTC
Val
GTC
Val
GTC
Val
230
77
CGC
Arg
CAC
His
CAC
His
247
83
GCA
Ala
ACA
Thr
394
132
GCT
Ala
TCT
Ser
515
172
CCG
Pro
CTG
Leu
520
174
GGG
Gly
CGG
Arg
CGG
Arg
1 The upper number denotes the position within the coding sequence and the lower number within the protein 2 Variant B1 corresponds to the genomic reference sequence (NC_009168.2:38266531-38271345) 3 Variant D2 corresponds to mRNA sequence NM_001082494 4 Variant G is probably a recombinant between variants B1 and E
39
Discussion
Methodology
In the current study, we resequenced the open reading frames of the equine LGB1, and LGB2
genes to identify putative protein variants at the DNA level. This is advantageous over protein
analyses, because DNA material such as hair samples can more easily be obtained than milk
samples. Furthermore, DNA sequencing directly identifies the mutations underlying the protein
variants and is also able to detect variation that only causes minor changes of the protein
properties, which can go undetected in standard protein analyses. However, nothing can be said
about the actual expression of the variants or mutations that affect splicing (Gallinat et al.,
2013). Thus, the designations assigned to the variants identified at the DNA level within this
study have to be regarded as preliminary and are subject to confirmation.
Although focussed on breeds that are actually kept for milk production, this study covers a
comparatively wide range of partly distantly related breeds, which increases the amount of
variation. The sample size per breed is, however, small so that the counted allele frequencies
have to be taken with care.
Breed specific variation
Only little variability was found in LGB1, while 10 different variants were found in LGB2. The
highest degree of variability was seen in the Icelandic Horses with 7 variants one of which was
private to the breed (LGB2*C2) This is notable as the breed originates from a small founder
population brought to Iceland approximately 1100 years ago and remained isolated since then
(Adalsteinsson, 1981). However, it has to be taken into account that samples were not collected
on Iceland, because Hreidarsdóttir et al. (2014) reported a higher diversity in terms of effective
founders for abroad as compared to the Icelandic population.
In each the Criollo, Haflinger and Quarter Horse breeds, 6 different variants of LGB2 were
detected. Notably, the variant LGB2*D1 occurs in Criollos and Icelandic Horses, but also in
Franches-Montagnes at a considerable frequency (Table 2). The lowest amount of variability
was found in the Russian Heavy Drafts, which almost uniformly carried variant LGB2*B with
an allele frequency of 0.93. This breed has been founded in the 1860s by grading up native
horses with Ardennes. The first world war, followed by the civil war, nearly wiped out the
breed, the stock of purebreds was reconstituted and isolated as an independent breed not before
1937 (Dmitriev and Ernst, 1989). Thus, the breed faced a serious bottleneck, which might be
an explanation for the low amount of variation.
40
The Warmblood samples from different countries (Germany, Switzerland, United Kingdom and
The Netherlands) can principally be considered belonging to the same horse breed or to be at
least very similar. Thus, these samples were also jointly analyzed revealing that the major
variants in this breed are LGB2*A and B1. This is different from Franches-Montagnes Horses,
which can also be considered as heavy Warmbloods, as variant B1 is rare in this breed, while
C1 is rather common.
Evolution of LGB2 variants
Due to the small sample size, conclusions about the evolution of the identified gene variants of
LGB2 are difficult, especially for rare variants. The determination of a variant is only
unequivocally possible on haplotypes with not more than one heterozygous position. On the
basis of the available information, we derived a simple model for the evolution of variants under
the constraint of as few mutations as possible (Figure 1).
The variant LGB2*A was also found by BLAST analysis in the LGB2 sequence of the donkey
(Equus asinus, lactoglobulin II variants B and C, GenBank accession number HM012799.1 and
HM012800.1). The domestic donkey represents a sister lineage of modern horses and shares a
most common recent ancestor with horses 4.0 – 4.5 Mya ago (Orlando et al., 2013) indicating
that LGB2*A is an ancestral variant.
Variant LGB2*B1 is also common in most of the examined breeds and was assumed to have
evolved from variant LGB2*A as a result of a single nucleotide exchange (c.164 C>T). From
this variant, LGB2*B2 evolved by means of an additional nonsynonymous mutation
(c.394G>T; p.Ala132Ser). The variants LGB2*C1, LGB2*D1, LGB2*E, and LGB2*F (c.164
T>G) each differ in a single amino acid position from variant A (Table 3 and Figure 1).
Subsequent mutations of variants C1 and D1 could have given rise to the variants LGB2*C2
and LGB2*D2. Finally, we observed variant LGB2*G, which is only present in the crossbred
animals and is characterized by the presence of both the mutations defining variants B1 and E,
respectively. Thus, it probably represents a recombinant haplotype.
41
Figure 1 Most likely evolution of equine LGB2 gene variants
Variability and function
The two paralogous lactoglobulin genes LGB1 (Gene ID 100034193) and LGB2 (Gene ID
100034194) are located adjacently on equine chromosome 25 with a distance of ~10 kbp.
Equine ß-lactoglobulin II comprises 163 amino acids, one more than equine ß-lactoglobulin I
(Halliday et al., 1991). Sequence homology between the two proteins is 70%, the amino acid
sequence differs in 52 positions (Conti et al., 1984; Godovac-Zimmermann et al., 1985). In
contrast to ruminants, which have been shown to possess LGB pseudogenes (Passey and
Mackinlay, 1995; Folch et al., 1996), both genes are expressed. In the current study, LGB1 was
found to be strongly conserved across breeds, while LGB2 was highly variable. This indicates
a higher selective pressure on LGB1 and suggests that it is the ancestral paralogue, which has a
crucial function that has to be maintained. The actual function of -lactoglobulin has, however,
not been determined to date. It seems likely that it acts as a transporter, which is the case for
many lipocalins (Kontopidis et al., 2004). Beta-lactoblobulin has been shown to bind small
42
hydrophobic compounds such as retinol and other lipophilic vitamins (Kontopidis et al., 2004;
Mensi et al., 2013), isothiocyanate (Keppler et al., 2014), and various polyphenols (Riihimäki
et al., 2008; Wu et al., 2013) as well as fatty acids, which is, however, not the case for equine
-lactoglobulin (Pérez and Calvo, 1995). Especially the role as a transporter for retinol and
carotenoids has been discussed, but it has been argued that retinol is highly soluble in the fat
phase of milk and will probably be transported from mother to offspring by that route
(Kontopidis et al., 2004). Equine milk, however, has a low fat content making it possible that
this function is more essential in horses than in other species with a higher fat content such as
cattle or humans. In fact, bovine -lactoglobulin is much more variable than equine -
lactoglobulin (Caroli et al., 2009) and the protein is completely absent from human milk
(Uniacke-Lowe et al., 2010). It seems possible, that the definite function of ß-lactoglobulin
varies from species to species (Kontopidis et al., 2004).
Allergenic potential
Lipocalins are common food and airborne allergens (Mäntyjärvi et al., 2000). In case of a cow
milk protein allergy (CMA), ß-lactoglobulin appears to be the main allergen, especially because
it is absent from human milk and resistant to acid digestion, especially in cattle and goats (Heine
et al., 2002). Inglingstad et al. (2010) showed that equine ß-lactoglobulin on the other hand is
highly degraded by gastrointestinal enzymes. Furthermore, in vitro and in vivo studies have
shown that mare’s milk is tolerated by 96% of the children with CMA. The absence of relevant
IgE binding epitopes in equine milk proteins, probably caused by differences in the amino acid
sequence, has been discussed as possible explanation (Businco et al., 2000; Curadi et al., 2001).
Also donkey’s milk is better tolerated by CMA patients (Iacono et al., 1992; Monti et al., 2012),
which has been linked to quantitative LGB2 polymorphisms leading to a very low -
lactoglobulin content (Chianese et al., 2013).
For cattle, it has been shown that genetic milk protein variants are leading to modifications of
the relevant epitopes and thus do change the allergenic potential of milk (Lisson et al., 2013).
It would thus be worth investigating how the high degree of variability at the equine LGB2
locus affects allergenicity of mare’s milk.
Acknowledgements
This project was founded by the German Federal Ministry of Education and Research (Bonn,
Germany) within the competence network “Food Chain Plus” (FoCus, grant no. 0315539A).
43
The authors would like to thank all the mare’s milk producers for providing samples, Julia
Tetens for her help with sample collection and Gabriele Ottzen-Schirakow for expert technical
assistance.
References
Adalsteinsson, S. 1981. Origin and conservation of farm animal populations in Iceland.
Zeitschrift für Tierzüchtung und Züchtungsbiologie 98(1-4):258–264.
Businco, L., Giampietro, P. G., Lucenti, P., Lucaroni, F., Pini, C., Di Felice, G., Iacovacci, P.,
Curadi, C., Orlandi, M. 2000. Allergenicity of mare’s milk in children with cow’s milk
allergy. Journal of Allergy and Clinical Immunology 105(5):1031–1034.
Caroli, A. M., Chessa, S., Erhardt, G. J. 2009. Invited review: milk protein polymorphisms in
cattle: effect on animal breeding and human nutrition. J. Dairy Sci. 92(11):5335–5352.
Chen, Y., Wang, Z., Chen, X., Liu, Y., Zhang, H., Sun, T. 2010. Identification of angiotensin
I-converting enzyme inhibitory peptides from koumiss, a traditional fermented mare's milk. J.
Dairy Sci. 93(3):884–892.
Chianese, L., Simone, C. de, Ferranti, P., Mauriello, R., Costanzo, A., Quarto, M., Garro, G.,
Picariello, G., Mamone, G., Ramunno, L. 2013. Occurrence of qualitative and quantitative
polymorphism at donkey beta-Lactoglobulin II locus. Food. Res. Int. 54(1):1273–1279.
Conti, A., Godovac-Zimmermann, J., Liberatori, J., Braunitzer, G., MINORI, D. 1984. The
Primary Structure of Monomeric β-Lactoglobulin I from Horse Colostrum (Equus caballus,
κ- casein) (Egito et al., 2002; Girardet et al., 2006; Lenasi et al., 2003; Lenasi et al., 2005;
Martin et al., 2009; Miclo et al., 2007; Milenkovic et al., 2002; Miranda et al., 2004; Selvaggi
et al., 2010). The recent knowledge about the individual caseins and their genetic variability is
summarized in Table 1.
The aim of this study was to provide extended knowledge about the genetic variability of equine
casein genes and to identify putative protein variants at the DNA level.
Table 1. Current knowledge about equine casein genes and their genetic variability.
Casein Gene
Symbol
Location
EquCab2.0
Chromosome 3
NC_009146.2
Remarks
αs1 CSN1S1 64,954,285…
64,970,471
Full length cDNA sequence: Lenasi et al. (2003).
Two variants due to exon skipping (Miranda et al., 2004).
Genomic (NC_009146.2) and mRNA (NM_ 001081883.1)
reference sequences differ (c.406 C>A).
ß CSN2 64,938,110…
64,946,489
Full length cDNA sequence: Lenasi et al. (2003) and Girardet
et al. (2006).
Two smaller variants reported (Miranda et al., 2004).
αs2 CSN1S2 64,795,317...
64,811,812
First described in 2000 (Egito et al., 2001; Egito et al., 2002;
Miranda et al., 2004; Ochirkhuyag et al., 2000).
Two major variants (CSN1S2*A, CSN1S2*B) due to a
genomic 1.3 kb deletion covering two coding exons
(Brinkmann et al., 2015).
CSN3 64,683,856…
64,694,148
First described in 2001 (Iametti et al., 2001; Miranda et al.,
2004).
Full lenghth cDNA sequence: Lenasi et al. (2003).
Two putative variants described at the DNA-level by Hobor
et al. (2006; 2008): Ile383Lys and Thr173Ala.
52
Material and Methods
Animals and Samples
Genomic DNA was extracted from hair samples of 198 horses from eight different breeds
actually used for mare milk production in Germany applying a modified protocol according to
Miller et al. (1988). The animals were selected to be as unrelated as possible. Additionally,
individual whole genome sequence variant calling data of a total 55 horses from 10 different
breeds available from other studies were incorporated in the analyses. The animals were
sequenced to a mean coverage of 15.8X; bioinformatic details were reported before
(Drögemüller et al., 2014; Frischknecht et al., 2014). In total, 253 horses belonging to 14
different breeds or populations were analyzed (Table 2).
Table 2. Animals used in the sequence analysis of the equine casein genes (n=253).
Breed Acronym SEQ1 WGS2 TOTAL
Akhal-Teke AK 1 1
Dairy Crossbreed3 CB 21 - 21
Argentine Criollo Horse CR 27 - 27
Fjord Horse FJ 3 - 3
Franches-Montagnes FM - 29 29
Haflinger HF 39 1 40
Icelandic Horse IC 25 1 26
Dutch Warmblood (KWPN) WBNL - 1 1
Quarter Horse QH 22 3 25
Russian Heavy Draft RU 24 - 24
Shetlandpony SP - 2 2
Swiss Warmblood WBCH - 3 3
UK Warmblood WBUK - 2 2
German Warmblood WBD 37 12 49
TOTAL 198 55 253
1 Data from DNA Sanger-sequencing. 2 Data from whole genome sequencing. 3 Breeds that are crossed include German Riding Pony, Haflinger Horse, Connemara Pony, New Forrest Pony and
further pony breeds to achieve a preferable high milk yield.
53
DNA Sequencing
A total of 37 primer pairs (Supplemental Table 1) were designed to amplify all exons
contributing to the open reading frames of the genes and adjacent intronic regions: The Primer
3 software (Rozen and Skaletsky, 2000) was used based on the genomic reference sequences
of the casein genes CSN1S1, CSN2, CSN1S2, and CSN3 (Acc. No NC_009146.2). The genomic
sequence of CSN1S1 was found to contain a gap spanning a coding exon. A flanking primer
pair was designed in order to close the gap by Sanger sequencing.
PCR amplification and DNA sequencing were done as described by Gallinat et al. (2013). The
obtained sequences were analyzed and compared with the genomic GenBank sequence
NC_009146.2 using the software Sequencher 4.9 (Gene Codes Corp., Ann Arbor, MI). The
discrimination of the known CSN1S2 variants A or B was done by fragment length analysis as
described by Brinkmann et al. (2015). Allele frequencies for all observed variants of CSN1S1,
CSN2, CSN1S2, and CSN3 were calculated by direct counting for the examined breeds.
Results
The open reading frames of the 4 casein genes were successfully sequenced in 244 horses
belonging to 8 breeds (numbers differ between genes; details are given in Table 3). Six, four,
eight and 13 variants were identified in CSN1S1, CSN2, CSN1S2, and CSN3, respectively. This
makes a total of 31 casein variants identified at the DNA level, two of which do represent signal
peptide variants and 26 of which can be regarded as novel. Moreover, 11 synonymous
nucleotide exchanges were identified. The counted allele frequencies of all variants are
summarized in Table 3, allele frequencies were only determined in breeds with at least 10
samples available.
Provisional names were assigned to the variants (CSN1S1*A - CSN1S1*D; CSN2*A - CSN2*C;
CSN1S2*A - CSN1S2*F; CSN3*A - CSN3*M). The numbering of positions on the gene refers
to the coding sequence for the full length proteins, including the signal peptide. For αs1-casein
this results in a 220 amino acid protein, resulting from the mRNA sequence NM_001081883.1
plus exon 7 (24 bp), which is missing in this sequence. The full length protein of -casein is
241 amino acids in length, resulting from the coding sequence NM_001081852.1 plus exon 5
(24 bp) which is missing in this sequence. Resulting from the reference sequence KP658381.1
the full length αs2- casein is 231 amino acids in length. The reference sequence
NM_001081884.1 codes for -casein, which is 185 amino acids in length.
54
Table 3. Number of examined animals per breed and counted allele frequencies at the 4 casein encoding genes CSN1S1, CSN2, CSN1S2 and CSN3.
Gene and variant
CSN1S1 CSN2 CSN1S2 CSN3
Breed1 n A A*2 B C D E n A A*2 B C n A B C D1 D2 E1 E2 F n A B C D E F G H I J K L M
1 For an explanation of breed acronyms see Table 1. 2 Asterisks indicate that the allele contains a signal peptide variant. 2 Crosses indicate that the correspondent variant is present in that breed, but the number of animals is too low to determine allele frequencies. 3 Dashes indicate that the correspondent variant is not present in that breed. 4 Breeds belonging to the European Warmblood population (Dutch, Swiss, UK and German Warmblood) were also analyzed jointly as WBTOTAL
55
CSN1S1
The genomic reference sequence NC_009146.2 (EquCab2.0) was found to contain a gap with an
estimated size of 674 bp spanning exon 16, which is present in the mRNA reference sequence
NM_001081883.1. The gap was closed by Sanger sequencing of a PCR product revealing an exact
gap size of 731bp (see Supplemental Figure 1).
Subsequently, sequence data of 197 animals were compared in order to find putative variants. As
compared to the genomic reference sequence (Acc. No. NC_009146.2), five previously unknown
non-synonymous nucleotide exchanges were identified within the open reading frame of CSN1S1
defining five putative protein isoforms (Table 4). One of the variants is predicted to affect the signal
peptide sequence; the corresponding allele with the signal peptide variant was termed CSN1S1*A*.
No sequence corresponding to the mRNA reference sequence NM_001081883.1 was found in the
analyzed samples.
The allele CSN1S1*A corresponding to the genomic reference sequence was the most common one
among the examined animals and was found in all breeds (Table 3). A signal peptide variant (c.25
C>A / Leu9Ile) defines the allele CSN1S1*A*), which was common in all breeds except Akhal-
Teke, Dutch Warmblood and Shetlandpony. A nucleotide exchange c.88 G>A (Glu30Lys)
characterizes allele CSN1S1*B, which was identified in Criollo Horses, Fjord Horses and
Shetlandponies. Allele CSN1S1*C is defined by an additional transition (c.470 T>C /
p.Val157Ala). This haplotype was exclusively found in Icelandic Horses. CSN1S1*D differs from
allele A by a T>G transversion in position 428 of the ORF (c.428T>G / p.Leu143Arg) and was
detected in Haflinger and Icelandic Horses. Allele CSN1S1*E is caused by a single nucleotide
exchange from C to A in position 329 (c.329C>A; pPro110Ala) and was found in German
Warmblood Horses as well as in Franches-Montagnes. Additionally, a synonymous nucleotide
exchange was found (c. 633G>A), which was in perfect linkage disequilibrium to the sequence
polymorphism defining allele CSN1S1*D.
56
Table 4. Sequence variants and putative protein alleles of the equine CSN1S1 gene.
αs1-casein protein alleles
Position in
ORF and
protein
Ref. Seq.1
A
A*
B
C
D
E
c.252
p.9
CTT
Leu
ATT
Ile
c.88
p.30
GAA
Glu
AAA
Lys
AAA
Lys
c.329
p.110
CCA
Pro
CAA
Gln
c.428
p.143
CTT
Leu
CGT
Arg
c.470
p.157
GTA
Val
GCA
Ala
1 Variant CSN1S1*A is corresponding to the genomic reference NC_009146.2 2 c.25 is located in the sequence coding for the signal peptide and leads to the signal peptide variant CSN1S1*A*
CSN2
The open reading frame of the CSN2 gene was successfully examined in 243 horses and four
previously unknown nonsynonymous nucleotide exchanges, each defining a putative protein
isoform, were detected (Table 5). Also for this gene, one of the variants affects the signal peptide.
The genomic reference sequence (NC_009146.2) was designated as CSN2*A and represented the
most common allele at this locus (Table 3). The allele with the signal peptide variant CSN2*A* is
characterized by a single nucleotide exchange c. 16C>T leading to a predicted amino acid exchange
from leucine to phenylalanine (p.Leu6Phe) in the signal peptide; this variant was detected with
allele frequencies up to 0.2 in the crossbred horses for dairy production, Criollo Horses, Franches-
Montagnes, Haflinger Horses, Icelandic Horses, Russian Heavy Draft and German Warmblood. A
single transition (c. 277G>A / p.Val93Ile) defines allele CSN2*B, which was found to be rare and
was only detected in Icelandic Horses. A haplotype of two nucleotide exchanges in positions 91
57
and 479 of the open reading frame gives rise to allele CSN2*C (Table 5). Furthermore, 6
synonymous nucleotide exchanges were found in the equine CSN2 gene (c.36C>T; c. 102C>T;
c.123G>A; c. 162G>A; c. 417C>T; c. 465C>T).
Table 5. Sequence variants and putative protein alleles of the equine CSN2 gene.
-casein protein alleles
Position in ORF
and protein
Ref. Seq.1
A
A*
B
C
c.162
p.6
CTT
Leu
TTT
Phe
c.91
p.31
CTT
Leu
TTT
Phe
c.277
p.93
GTT
Val
ATT
Ile
c.479
p.160
CTG
Leu
CCG
Pro
1 Variant CSN2*A is corresponding to the genomic reference NC_009146.2 2 c.16 is located in the sequence coding for the signal peptide and leads to the signal peptide variant CSN2*A*
CSN1S2
Sequencing of the CSN1S2 gene was successfully completed in 205 animals. A total of 6
nonsynonymous single nucleotide variants and one large deletion leading to 8 distinct putative
protein isoforms were identified, 6 of which were considered novel (Table 6).
Allele CSN1S2*A (Acc. No. KT368778) corresponding to the genomic reference sequence
(NC_009146.2) was found to be most frequent across all analyzed breeds. Allele CSN1S2*B (Acc.
No. KT368779), which has already been described by Brinkmann et al. (2015), differs from allele
A by the presence of a 1,339 bp deletion covering two coding exons. It was found in Crossbred
Horses, Fjord Horses, Icelandic Horses, Quarter Horses, Russian Heavy Draft Horses and German
Warmbloods with allele frequencies ranging from 0.01 in Warmblood Horses to 0.35 in Haflinger
58
Horses. The putative allele CSN1S2*C on the other hand, defined by a single nucleotide exchange
c.398C>T leading to a predicted amino acid exchange from threonine to isoleucine in position 133
of the protein (Table 6), was found to be rare occurring at low frequencies in Fjord, Haflinger, and
Quarter Horses as well as German Warmbloods. A non-synonymous transition in position 218 of
the ORF (c.218C>T / p.Thr73Ile) defines allele D. This exchange was found to occur on the long
allele CSN1S2*A as well as on the short allele B and the resulting alleles were thus designated
CSN1S2*D1 and CSN1S2*D2, respectively. Allele D1 was detected at low frequencies in Icelandic
and Quarter Horses as well as Russian Heavy Drafts and German Warmbloods, while CSN1S2*D2
was found in German Warmblood Horses only (Table 3). Equivalently, allele E, which is defined
by two nucleotide exchanges in codons 129 and 217 (see Table 6 for details) was found in
conjunction with the long as well as the short variant and the respective alleles arising from those
haplotypes were termed CSN1S2*E1 and CSN1S2*E2. The former was detected in Fjord,
Haflinger, and Quarter Horses, while the latter was only found in Haflinger and Quarter Horses.
Finally, two transitions in positions 182 and 640 of the ORF (Table 6) characterize the rare allele
CSN1S2*F, which was exclusively detected in Icelandic Horses. Additionally, four synonymous
nucleotide exchanges were detected in equine CSN1S2 (c.21C>T; c.24C>T; c.225A>G;
c.402A>G). The synonymous nucleotide exchange c.225 A>G was always observed in
combination with allele CSN1S2*F in Icelandic horses.
59
Table 6. Sequence variants and putative protein alleles of the equine CSN1S2 gene.
αs2-casein protein alleles
Position in
ORF and
protein
Ref. Seq.1
A
B
C
D1
D2
E1
E2
F
c.182
p.61
AGG
Arg
AAG
Arg
c.218
p.73
ACA
Thr
ATA
Ile
ATA
Ile
c.199-2492
p.67-83
ins.
del.
del.
del.
c.386
p.129
CGG
Arg
CAG
Gln
CAG
Gln
c.398
p.133
ACC
Thr
ATC
Ile
c.640
p.214
CGG
Arg
TGG
Try
c.651
p.217
AGA
Arg AGT
Ser
AGT
Ser
1 Variant CSN1S2*A is corresponding to the genomic reference NC_009146.2. 2 This variant is represented by a large deletion, which has been described by Brinkmann et al. (2015).
CSN3
The ORF of the CSN3 gene was successfully resequenced in 244 animals revealing 6
nonsynonymous nucleotide exchanges, 4 of which had not been described before. A total of 13
putative protein isoforms were predicted from these nucleotide exchanges (Table 7).
The allele represented by the genomic reference sequence (Acc. No. NC_009146.2) was denoted
CSN3*A. It was found to be the most common one in all examined breeds with frequencies of up
to 0.88. Four alleles, namely CSN3*B, CSN3*F, CSN3*H and CSN3*J, were found to differ from
the genomic reference by only one nucleotide exchange each (Table 7). Allele B, which is
characterized by an amino acid exchange from threonine to alanine in codon 29, occurred at a rather
60
high frequency of 0.16 (Table 3) in Quarter Horses, while allele F, which is defined by an
asparagine to lysine exchange in codon 24, was seen at a frequency of 0.2 in Russian Heavy Drafts.
Allele H, which is also defined by a threonine to alanine exchange, but at codon 173, has been
described before (Hobor et al., 2006; Hobor et al., 2008) and was found to occur in Haflinger
Horses and German Warmbloods in our study. The putative protein isoform CSN3*J is
characterized by the occurrence of a premature stop codon leading to a truncated protein lacking
amino acid positions 183 to 185. The allele was found in several breeds at low frequencies. The
variants defining alleles B and J were also identified in conjunction with the variant characterizing
the putative allele CSN3*C (Table 7), which was found in high frequencies of up to almost 0.3 in
Haflinger and Icelandic Horses (Table 3). Likewise, the nucleotide exchanges defining alleles H
and J were found together in a haplotype giving rise to the rare allele CSN3*L. The variations
causing alleles F and H, respectively, were found to jointly define CSN3*G, which was only found
in Haflinger and German Warmbloods. A further non-synonymous nucleotide exchange was
identified in codon 22 (Table 7). This variant did not occur independently, but was only seen along
with the exchange in codon 24, together defining the haplotype of the rare allele CSN3*I (Tables
3 and 7). Finally, an already described (Hobor et al., 2006; Hobor et al., 2008) non-synonymous
nucleotide exchange was found in codon 128 (c.282T>A), which was also not detected
independently. Against the background of allele L e.g., it defines CSN3*K, which was the second
most frequent variant in Franches-Montagnes (Tables 3 and 7). Furthermore, the nucleotide
exchanges were found to exist in additional combinations, namely defining alleles CSN3*D,
CSN3*E, and CSN3*M (Table 7). Allele D was found to exhibit a high frequency in Icelandic
Horses, while variants M and E were only found in Quarter Horses with the latter differing from
the reference sequence in 5 positions.
61
Table 7. Sequence variants and putative protein alleles of the equine CSN3 gene.
-casein protein alleles
Position in
ORF and
protein
Ref. Seq.1
A
B
C
D
E
F
G
H
I
J
K
L
M
c.65
p.22
GTG
Val
GCG
Ala
GCG
Ala
c.72
p.24
AAC
Asn
AAG
Lys
AAG
Lys
AAG
Lys
AAG
Lys
AAG
Lys
c.85
p.29
ACA
Thr
GCA
Ala
GCA
Ala
GCA
Ala
c.3832
p.128
ATA
Ile
AAA
Lys
AAA
Lys
c.5172
p.173
ACC
Thr
GCC
Ala
GCC
Ala
GCC
Ala
GCC
Ala
GCC
Ala
GCC
Ala
GCC
Ala
c.547
p.183
CAA
Gln
TAA
STOP
TAA
STOP
TAA
STOP
TAA
STOP
TAA
STOP
TAA
STOP
1 Variant CSN1S2*A is corresponding to the genomic reference NC_009146.2 2 Hobor et al., 2006; Hobor et al., 2008
Discussion
Methodolgy
Within the current study, DNA sequencing of the open reading frames was used to identify putative
protein variants of the equine caseins. Thereby, 26 new casein variants, including two signal
peptide variants, were detected. As discussed by Gallinat et al. (2013), the main advantages of this
methodology are the easy applicability and the better availability of DNA samples as compared to
protein samples, especially when breeds from different countries are considered. The main
disadvantage, however, is that neither the actual expression of variants nor posttranslational
modifications can be evaluated. Furthermore, variations due to differential alternative splicing
cannot readily be detected. For αs1-casein and ß-casein e.g., shorter variants have been identified
(Girardet et al., 2006; Lenasi et al., 2003; Miranda et al., 2004; Table 1), which can only be
characterized at the transcript or protein level.
62
To confirm the novelty of the identified variants, a BLAST search
(http://blast.ncbi.nlm.nih.gov/Blast.cgi) of the generated sequence data against public databases
was conducted. None of the variants denoted as novel in the current study were identified
confirming their novelty. Interestingly, one polymorphism in CSN1S2 (c.386G>A / p.Arg129Gln)
was not found in the CSN1S2 sequences of Equus caballus, but was identified in the predicted
mRNA sequence of CSN1S2 in Equus przewalskii (Acc. No. XM_008510698.1).
Allele frequencies within breeds were determined by counting. These frequencies have to be taken
with care, because the number of animals per breed was rather small. Nevertheless, these figures
give an overview of the breed distribution for the identified variants. The provisional nomenclature
for the alleles established here will be subject to confirmation at the protein level. Furthermore, in
some instances two or more variants were only found heterozygously within one gene hampering
the unequivocal definition of haplotypes. This was the case for CSN3*I. Due to the large number
of identified variants, it was, however, necessary to establish a nomenclature to simplify
referencing and discussion.
The selection of animals for the study was initially limited to breeds that are actually used for mare
milk production in Germany. Especially the Haflinger Horse is, not only in Germany, a favored
breed for dairy production and is used by many mare milk farmers. But also other breeds such as
the German Warmblood and the Russian Heavy Draft, or even special breeds such as Criollo,
Quarter Horse, and Icelandic Horse are used by German mare milk farmers, potentially yielding a
higher selling price especially for the male offspring. Thus, a comparatively wide spectrum of
breeds has been covered. Notably, one of the sampled populations represents a crossbreed
including German Riding Pony, Haflinger Horse, Connemara Pony, and New Forrest Pony as well
as further Pony breeds. According to the owner, this breed has especially been produced for dairy
farming and selected for milkability und milk yield. It remains, however, unclear how phenotypes
have been recorded and selection has been conducted.
Breed specific variation patterns
The Icelandic Horse exhibited the highest degree of variability within this study. A total of 19
different casein gene alleles were found in this breed, four of which (CSN1S1*C, CSN2*B,
CSN2*C, CSN1S2*F) were found exclusively within this breed. This seems unexpected in the first
instance, because the breed originates from a small founder population. These animals were
63
brought to Iceland approximately 1,100 years ago and the population has been closed since then
(Adalsteinsson, 1981). However, the samples were not taken on Iceland and Hreidarsdóttir et al.
(2014) reported a higher diversity in terms of effective founders for abroad as compared to the
Icelandic population.
With 18 different alleles, the second most isoforms were identified in German Warmblood Horses
followed by Quarter Horses with 16 alleles, three of which were exclusively found in this breed
(CSN3*B, CSN3*E, CSN3*M). While alleles E and M were found to be rare, CSN3*B had an allele
frequency of 0.15 and might thus be designated as a characteristic allele in Quarter Horses. A
comparatively low variation was found in the crossbreed for dairy production with only 8 casein
gene variants. This is noteworthy as several different breeds have been crossed here. It might,
however, be possible that selection for milk yield has reduced the casein variability, because some
variants have a strong effect on the target trait.
Evolution of the casein variants
Based on the current data, it is in many instances difficult to draw conclusions about the exact
evolution of the putative casein alleles. In most cases, however, the ancestral allele and possible
routes of variant evolution can be inferred.
In the case of CSN1S1, allele A seems to represent the ancestral haplotype because of the high allele
frequencies distributed over all examined breeds (Table 3). The alleles CSN1S1*A*, CSN1S1*B,
CSN1S1*D, and CSN1S1*E differ from CSN1S1*A by only one nucleotide exchange each and
might have directly evolved from the ancestral allele. Allele CSN1S1*C can be derived from allele
B by one additional non-synonymous nucleotide exchange in position 470 of the open reading
frame.
Likewise, allele CSN2*A was found in all breeds at high frequencies, which indicates an ancestral
status of this allele. The other alleles were found to be rare (Table 3) and differ from allele A by
one (CSN2*A*, CSN2*B) or two (CSN2*C) nucleotide exchanges (Table 5).
The situation for CSN1S2 seems to be more complicated. In a previous study, we have identified
two major alleles (CSN1S2*A and CSN1S2*B) differing in length by 17 amino acids due to a large
genomic deletion spanning two coding exons and determined that the deletion has probably
occurred before the ancestor of present-day asses and zebras diverged from the horse lineage
(Brinkmann et al., 2015). Alleles CSN1S2*C and CSN1S2*F differ from the long reference allele
64
A by one and two mutations, respectively, and might have evolved from this allele (Table 6). Two
other alleles, CSN1S2*D and CSN1S2*E, however, do occur in conjunction with both the long and
the short allele; the resulting alleles were termed D1/D2 and E1/E2, respectively (Table 6). Overall,
these are rare, but allele D1 does occur more frequently than D2 (Table 3). Thus, it is possible, that
CSN1S2*D1 might have evolved from the long allele CSN1S2*A and that CSN1S2*D2 represents
a recombinant haplotype. The alleles E1 and E2 are comparably rare and it is possible that the
underlying polymorphisms have been segregating together for a long period. Notably, one of the
polymorphisms defining these alleles (c.386 A>G) was found also in the CSN1S2 sequences of
Equus przewalskii (Acc. No. XM_008510698.1) by BLAST analysis, which supports this
hypothesis.
Several alleles of CSN3 were detected in the current study. It is likely, that either CSN3*A or
CSN3*F might be an ancestral allele. From CSN3*A allele CSN3*B could have evolved by one
sequence variant (c.85A>G) and a further mutation (c.547 C>T) might have led to CSN3*C.
CSN3*F seems to be the basis for the development of the alleles CSN3*G and CSN3*I, each caused
by an additional exchange (Table 7). From CSN3*G, allele CSN3*D might have evolved, which
subsequently could have led to the development of CSN3*E. However, due to the large number of
alleles arising from the various possible haplotypes, the evolution of CSN3 cannot be further
elucidated based on the current data.
Equine casein isoforms - consequences for production and human consumption
In the dairy sector, mare’s milk is a high priced niche product, which is marketed under several
health claims. The production is costly as mares can only be milked with foal at foot. Thus,
selection for milk yield or contents has not taken place so far. Attempts to select for these traits are
furthermore hampered by a lack of routine milk recording schemes, which would be difficult to
implement for several reasons. Especially, the foal’s milk consumption cannot readily be
determined (Doreau and Martuzzi, 2006). A main criterion in the selection of dairy mares would,
however, be good milking ease (Doreau and Boulot, 1989; Doreau and Martuzzi, 2006), but with
the exception of the aforementioned crossbreed for dairy production, no selection does take place
and the potential is still unexploited.
From other dairy species, effects of casein isoforms on performance traits are known (Boettcher et
al., 2004; Heck et al., 2009; Martin et al., 2002), but for the above reasons this cannot be analyzed
65
in dairy mares. However, several of the health benefits empirically ascribed to horse milk might be
attributable to the protein fraction and thus depend on the pattern of protein variants. These benefits
include putative positive effects on gastrointestinal ulcers, digestive and cardiovascular diseases,
diarrhea and gastritis. Also other diseases like tuberculosis, anaemia, chronic hepatitis and nephritis
were traditionally treated with horse milk or kumyß, an alcoholic fermented mare’s milk drink,
especially in Russian sanatoria. Several reasons for the effectiveness were suggested, such as the
fatty acid pattern or the high content of lysozyme and lactoferrin. Also peptides arising from the
hydrolysis of ß-casein may be responsible for health effects. Mare milk and kumyß contain peptides
with hypotensive activity, but specific research on bioactive peptides from mare milk is scarce
(Doreau and Martin-Rosset, 2002). Some current studies provided first scientific evidence of the
health effects of mare milk, there are studies about beneficial effects on atopic dermatitis (Foekel
et al., 2009), chronic-inflammatory bowel diseases (Schubert et al., 2009) or cardiovascular
diseases (Chen et al., 2010). The extended knowledge about the equine milk protein genes might
provide a basis for further studies about the effect of mare milk on human health, especially related
to the release of bioactive peptides.
Mare’s milk is also considered a hypoallergenic foodstuff. It has been shown in vitro and in vivo,
that this milk is tolerated by 96% of children, who are affected by cow milk allergy (CMA)
(Businco et al., 2000; Curadi et al., 2001). CMA is an IgE mediated allergenic reaction causing a
broad range of symptoms such as atopic dermatitis, constipation and infantile colic. This condition
affects approximately 2% of infants when nourished with milk replacements on cow milk basis
(Heine et al., 2002). Among the caseins, αs1-casein has been identified as the protein with the
highest allergenic potential and many individuals affected by CMA show a high titer of IgE specific
for this protein (Gaudin et al., 2008; Lisson, 2014; Ruiter et al., 2006; Schulmeister et al., 2009;
Shek et al., 2005). Several reasons for the low allergenicity of mare’s milk are discussed, for
example the absence the epitopes relevant for the IgE binding. In the current study, a signal peptide
variant leading to the allele CSN1S1*B* was detected, which might principally cause a reduced
content or absence of αS1-casein. This variant was found to be very common with allele frequencies
of up to 0.5. We were, however, not able to assess, whether the altered signal peptide affects protein
expression. This should be subject to further studies.
66
Conclusions
Within the current study, the genetic diversity of equine casein genes was assessed at the DNA-
level in 253 horses belonging to 14 different breeds or populations. Thereby, 32 different putative
casein isoforms were identified, 26 of which can be considered novel. This study gives, for the first
time, a comprehensive overview of genetic variability at the casein loci in horses including
noteworthy findings such as the high degree of variability in Icelandic horses. The results provide
a foundation for further research into the properties of the equine milk protein fraction.
Acknowledgements
This project was founded by the German Federal Ministry of Education and Research (Bonn,
Germany) within the competence network “Food Chain Plus” (FoCus, grant no. 0315539A). The
authors would like to thank all the mare’s milk producers for providing samples, Julia Tetens for
her help with sample collection and Gabriele Ottzen-Schirakow for expert technical assistance.
References
Adalsteinsson, S. 1981. Origin and conservation of farm animal populations in Iceland.
Zeitschrift für Tierzüchtung und Züchtungsbiologie 98(1-4):258–264.
Brinkmann, J., Koudelka, T., Keppler, J. K., Tholey, A., Schwarz, K., Thaller, G., Tetens, J.
2015. Characterization of an Equine α-S2-casein Variant due to a 1.3 kb Deletion Spanning Two
Coding Exons. DOI: 10.1371/journal.pone.0139700
Businco, L., Giampietro, P. G., Lucenti, P., Lucaroni, F., Pini, C., Di Felice, G., Iacovacci, P.,
Curadi, C., Orlandi, M. 2000. Allergenicity of mare’s milk in children with cow’s milk allergy.
Journal of Allergy and Clinical Immunology 105(5):1031–1034.
Caroli, A. M., Chessa, S., Erhardt, G. J. 2009. Invited review: milk protein polymorphisms in
cattle: effect on animal breeding and human nutrition. J. Dairy Sci. 92(11):5335–5352.
67
Chen, Y., Wang, Z., Chen, X., Liu, Y., Zhang, H., Sun, T. 2010. Identification of angiotensin I-
converting enzyme inhibitory peptides from koumiss, a traditional fermented mare's milk. J.
Dairy Sci. 93(3):884–892.
Curadi, M. C., P. G. Giampietro, P. Lucenti, and M. Orlandi. 2001. Use of mare milk in pediatric
allergology., Firenze, 12.-15.06.2001.
Doreau, M., Boulot, S. 1989. Recent knowledge on mare milk production: A review. Livestock
Production Science 22(3-4):213–235.
Doreau, M., Martin-Rosset, W. 2002. Dairy animals: horse. p. 630–637. In H. Roginski, J. A.
Fuquay, and P. F. Fox (eds.). Encyclopedia of dairy sciences. Academic Press, London, UK.
Doreau, M., Martuzzi, F. 2006. Milk yield of nursing and dairy mares. p. 77–87. In N. Miraglia,
and W. Martin-Rosset (eds.). Nutrition and feeding of the broodmare. Academic Publishers,
Wageningen.
Drögemüller, M., Jagannathan, V., Welle, M. M., Graubner, C., Straub, R., Gerber, V., Burger,
D., Signer-Hasler, H., Poncet, P.-A., Klopfenstein, S., Niederhäusern, R. von, Tetens, J., Thaller,
G., Rieder, S., Drögemüller, C., Leeb, T. 2014. Congenital hepatic fibrosis in the Franches-
Montagnes horse is associated with the polycystic kidney and hepatic disease 1 (PKHD1) gene.
PloS one 9(10):e110125.
Egito, A. S., Girardet, J.-M., Miclo, L., Mollé, D., Humbert, G., Gaillard, J.-L. 2001.
Susceptibility of equine κ- and β-caseins to hydrolysis by chymosin. International Dairy Journal
11(11-12):885–893.
Egito, A. S., Miclo, L., López, C., Adam, A., Girardet, J.-M., Gaillard, J.-L. 2002. Separation and
characterization of mares' milk alpha(s1)-, beta-, kappa-caseins, gamma-casein-like, and proteose
peptone component 5-like peptides. J. Dairy Sci. 85(4):697–706.
68
Foekel, C., Schubert, R., Kaatz, M., Schmidt, I., Bauer, A., Hipler, U.-C., Vogelsang, H., Rabe,
K., Jahreis, G. 2009. Dietetic effects of oral intervention with mare's milk on the Severity Scoring
of Atopic Dermatitis, on faecal microbiota and on immunological parameters in patients with
atopic dermatitis. Int J Food Sci Nutr 60 Suppl 7:41–52.
Frischknecht, M., Neuditschko, M., Jagannathan, V., Drögemüller, C., Tetens, J., Thaller, G.,
Leeb, T., Rieder, S. 2014. Imputation of sequence level genotypes in the Franches-Montagnes
GCTGGATAATTGCTCAACACTCA-3’) was designed to specifically amplify the entire region
of CSN1S2 encompassing the deletion. PCR amplification and DNA sequencing were done as
previously described (Gallinat et al., 2013). The obtained sequences were analyzed and compared
with the genomic reference sequence (Acc. No. NC_009168.2) using the software Sequencher 4.9
(Gene Codes Corp., Ann Arbor, MI).
RNA isolation from milk samples and cDNA synthesis
Individual milk samples were obtained from four mares with known deletion genotype. An aliquot
of 40 ml was centrifuged at 6,000 g for 10 minutes. The supernatant including the milk fat layer
was discarded and remaining milk fat was thoroughly removed with alcohol wipes. The cell pellet
was washed three times with 1x phosphate buffered saline. Cells were homogenized using
QIAShredder columns (Qiagen, Hilden, Germany) and total RNA was isolated using the Qiagen
RNeasyMini Kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. The
isolated RNA was transcribed into cDNA using SuperScript® III First-Strand Synthesis SuperMix
kit (Invitrogen) with oligo-dT primers. PCR amplification and sequencing was done with primers
located in the untranslated regions (Forward: 5’-TGCCTGCACTTTCTTGTCTTCCA-3’, Reverse:
5’-TGCACAGTCTTCATTTGGCTTGA-3’).
Protein and peptide analysis
Individual milk samples of two mares with known genotype were used for protein analysis. These
samples were dialyzed to remove lactose, subsequently freeze dried and stored at -18 °C for 4
months. Lyophilized milk powder was dissolved in Laemmli buffer (1x) at a concentration of 2
mg/mL and 10 and 20 µg of crude protein was loaded onto a 12% SDS-PAGE gel (150V for 85
min). Gel bands were destained, reduced and alkylated and then subsequently in gel-digested
overnight with trypsin (60 ng) using standard protocols. Peptides were extracted from the gel, dried
down using vacuum centrifugation and resuspended in 3% acetonitrile (ACN) and 0.1%
trifluroacetic acid (TFA) before being analyzed by LC-MS.
Nano-UHPLC-MS was performed on an UltiMate 3000 RSL Nano/Cap System (Thermo Fisher
Scientific, Bremen, Germany) coupled online to an Orbitrap QExactive (Thermo Fisher Scientific).
Samples were desalted for 4 minutes (Acclaim PepMap100 C-18, 300 µm I.D. x 5 mm, 5 µm, 100
77
Å, Thermo Fisher Scientific) at a flow rate of 30 µL/min using 3% ACN, 0.1% TFA. An Acclaim
PepMap100 C-18 column (75 µm I.D. x 500 mm, 2 µm, 100 Å, Thermo Fisher Scientific) was
used for analytical separation at a flow rate of 300 nL/min using binary gradients of buffers A
(0.05% FA) and B (80% ACN and 0.04% FA). The elution used gradient steps of 5-50% B (4-30
min) and 50-90% B (30-35 min) followed by an isocratic wash (90% B, 35-45 min) and column
re-equilibration (5% B, 45-60 min) steps.
MS scans were acquired in the mass range of 300 to 2,000 m/z at a resolution of 70,000. The ten
most intense signals were subjected to HCD fragmentation using a dynamic exclusion of 15 s.
MS/MS parameters - minimum signal intensity: 1000, isolation width: 3.0 Da, charge state: ≥2,
HCD resolution: 15,000, Normalized collision energy of 25. Lock mass (445.120025) was used for
data acquired in MS mode.
HCD spectra were searched using Proteome Discoverer 1.4 (1.4.0.288, Thermo Fisher Scientific)
with the Sequest-HT search algorithm against the complete reviewed and unreviewed Equus
caballus database (28,188 sequences, downloaded 2015.07.16) with common contaminants
(ftp://ftp.thegpm.org/fasta/cRAP/) appended. The following database search settings were used:
MS tolerance; ± 10 ppm, MS2 Tolerance; 0.02 Da, enzyme specificity; trypsin with up to three
missed cleavages allowed. Carbamidomethylation on cysteine residues was set as a fixed
modification while, oxidation on methionine, and phosphorylation on serine and threonine residues
was set as a variable modification. Only peptides which were identified with medium confidence
(FDR <5%) were included.
Results and Discussion
DNA sequencing and mutation screening
The current annotation of the equine CSN1S2 gene (GeneID 100327035) is based on the mRNA
reference sequence NM_001170767.2 containing 15 coding exons with an open reading frame of
645 bp. In an attempt to resequence the open reading frame using exon flanking primer pairs, we
recognized that the PCR reactions for exons 8 and 9 consistently failed in particular horses. In order
to unravel the possible cause for this phenomenon, we amplified a 2.6 kb fragment spanning the
entire region. While the expected product was obtained from samples that had been successfully
amplified before, the product obtained from initially unsuccessful samples was found to be
approximately 1.3 kb shorter (Figure 1). Subsequent Sanger sequencing of the products revealed
78
the presence of a 1,339 bp deletion in the short variant (Figure 2A), while the long product was
found to completely correspond to the genomic reference sequence (NC_009146.2). Analysis of
this sequence revealed the presence of a 309 bp duplication of the region encompassing exon 8 of
the gene (Figure 2A). Because this duplication is located exactly at the boundary of the deletion,
the exact position cannot be determined, i.e. it cannot be ruled out, whether the upstream or
downstream duplicate is involved in the deletion.
A total of 193 horses belonging to 8 breeds (Table 1) were tested for the presence of the deletion
by PCR and subsequent agarose gel electrophoresis (Figure 1). The deletion was found to be
present in all analyzed breeds; the highest frequencies of 0.36 and 0.25 were observed in Haflinger
and Icelandic horses, respectively. Notably, these breeds are common in mare’s milk production,
especially the Haflinger breed is widely used. This might possibly indicate an effect of the mutation
or a certain casein-haplotype on milk yield as this is e.g. the case in cattle (McLean et al., 1984;
Ikonen et al., 2001).
Table 1. Frequencies of the 1.3 kb deletion in different horse breeds.
Breed n ins/insa ins/dela del/dela Frequency of deletion
Crossbredb 21 14 6 1 0.19
Criollo 27 24 3 0 0.06
Fjord Horse 3 2 1 0 -c
Haflinger Horse 39 17 16 6 0.36
Icelandic Horse 24 13 10 1 0.25
Quarter Horse 20 16 3 1 0.13
Russian Heavy Draft 24 20 3 1 0.10
German Warmblood 35 33 2 0 0.03
TOTAL 193 139 44 10 0.17
a ins = long variant corresponding to the genomic reference NC_009146.2; del = short variant encompassing the 1,339
bp deletion spanning two coding exons. b A synthetic cross involving German Riding Pony, Haflinger Horse, Connemara Pony and New Forrest Pony; bred
for milk yield. c The frequency in Fjord Horses is not reported with respect to the small sample size, but the breed is included in the
total values.
79
Figure 1. Agarose gel electrophoresis of PCR products spanning the 1.3 kb deletion. The upper visible band
corresponds to the long variant (denoted as +), the lower one to the short variant containing the deletion (denoted as
). Only in heterozygotes, a third band with a size of approximately 2.1 kb is visible, which is possibly arising from
asymmetric hybridization of the alleles due to the presence of a duplication. The breeds of the corresponding samples
are given above the lanes (RHD = Russian Heavy Draft, GWb = German Warmblood, IC = Icelandic Horse, HF =
Haflinger).
Figure 3. Agarose gel electrophoresis of equine CSN1S2 cDNA. The RNA was isolated from the milk of a mare
being homozygous for the deletion (/).and three mares homozygous for the long variant (+/+).
80
Figure 2. Structure of the long and short equine αs2-casein variants. A. Genomic organization of the respective gene segment. Grey shading indicates the equid
specific 309 bp duplication comprising coding exons 8 and 10, respectively. The 1.3 kb in-frame-deletion is indicated above the figure. B. Structures the resulting
transcript variants. C. Protein alignment of available ungulate αs2-casein protein sequences.
81
Analysis of transcripts
The duplicated region within the 1.3 kb deletion contains a coding exon with a length of 24 bp.
The two copies were found to be completely identical including intact splice sites. However, only
one of the identical exons is present in the current RefSeq transcript NM_001170767.2. Thus, it
was unclear which exons are transcribed and whether both variants are transcribed at all. Therefore,
we purified total RNA from the skimmed milk of four mares, three of them being homozygote for
the long and one for the short variant, respectively. After reverse transcription, the CSN1S2
transcripts were amplified using primers located in the untranslated regions. Agarose gel
electrophoresis of the PCR products revealed a difference of approximately 50 bp between the
alternatively homozygote animals (Figure 3) showing that both, a long and a short transcript, were
actually expressed. Subsequently, the open reading frames of both transcripts were sequenced. The
difference was found to be due to a 51 bp in-frame insertion/deletion after exon eight encompassing
the duplicated exon as well as a previously not annotated exon that perfectly aligns within the 1.3
kb deletion (Figure 2A/B). Exon numbering was consequently adapted counting the newly
annotated exon as exon 9 and the duplicate of exon 8 as exon 10. Although the genome assembly
(EqCab2.0) comprises the long variant, the RefSeq transcript NM_001170767.2 used for
annotation represents the short variant. A BLAST search showed that both transcripts had been
reported before (GenBank KP658381.1 and KP658382.1) and both variant transcripts have recently
been added to the unreviewed UniProt database (Acc. No. A0A0C5DH76 and D2KAS0), but no
further information or publication is available. The transcript sequences from the current study are
available under the accession numbers KT368778 and KT368779.
Comparative analysis
Translations of the long and short transcript, respectively, were aligned to available protein
sequences of domestic donkey (Acc. No. CAV00691.1 (Chianese et al., 2010)), cattle (Acc. No.