Page 1
Université de Montréal
‘Evo1ution ofC2H2-Zinc finger genes in mammalian genomes”
par
“Hamsa Dhwani Tadepally”
“Département de Biochimie”
“Faculté de Médecine”
Thèse présentée à la Faculté des études supérieures
en vue de l’obtention du grade de Maitrise
En Biochimie
“July 2007”
© “Hamsa Dhwani Tadepally” ,2007
2[i7 Cî L
cl
Page 3
Universitéde Montréal
Direction des bibliothèques
AVIS
L’auteur a autorisé l’Université de Montréal à reproduire et diffuser, en totalitéou en partie, par quelque moyen que ce soit et sur quelque support que cesoit, et exclusivement à des fins non lucratives d’enseignement et derecherche, des copies de ce mémoire ou de cette thèse.
L’auteur et les coauteurs le cas échéant conservent la propriété du droitd’auteur et des droits moraux qui protègent ce document. Ni la thèse ou lemémoire, ni des extraits substantiels de ce document, ne doivent êtreimprimés ou autrement reproduits sans l’autorisation de l’auteur.
Afin de se conformer à la Loi canadienne sut la protection desrenseignements personnels, quelques formulaires secondaires, coordonnéesou signatures intégrées au texte ont pu être enlevés de ce document. Bienque cela ait pu affecter la pagination, il n’y a aucun contenu manquant.
NOTICE
The author of this thesis or dissertation has granted a nonexclusive ticenseallowing Université de Montréal to reproduce and publish the document, inpart or in whole, and in any format, solely for noncommercial educational andresearch purposes.
The author and co-authors if applicable retain copyright ownership end moralrights in this document. Neithet the whole thesis or dissertation, flotsubstantial extracts from it, may be printed or otherwise reproduced withoutthe author’s permission.
In compliance with the Canadian Privacy Act some supporting forms, contactinformation or signatures may have been removed from the document. Whilethis may affect the document page count, it does flot tepresent any loss ofcontent from the document.
Page 4
Université de Montréal
faculté des études supérieures
Cette thèse intitulée
“Evolution of C2H2-Zinc finger genes in mammalian genomes”
Présentée par:
“Hamsa Dhwani Tadepally”
a été évaluée par un jury composé des personnes suivantes:
“Martine Raymond”
Président-rapporteur
“Muriel Aubry”
Directrice de recherche
“Gertraud Burger”
Co-directrice
“Nicolas Lartillot”
Membre dujmy11
Page 5
Résumé
Les gènes de doigt de zinc de C2H2/Kruppel (C2H2-ZNF) encodent la plus grande classe
des facteurs de transcription chez Phomme. Ces gènes constituent une des plus grandes
familles de gène chez les mammifères et sont souvent trouvés sous forme de regroupements
de gènes juxtaposés sur les chromosomes. Par une recherche extensive basée sur des
similitudes de séquences visant à d’identifier l’ensemble des gènes C2H2-ZNF du génome
humain, nous avons assemblé un répertoire complet de 718 gènes C2H2-ZNf humains. Les
gènes C2H2-ZNF ont été classifiés en sous-familles en fonction des domaines effecteurs N-
terminaux aux quels ils sont associés. Nous avons constaté que la sous-famille encodant un
domaine KRAB comprend 45% de tous les gènes C2H2-ZNF et est par conséquent fa
plus grande sous-famille de gènes à motifs doigt de zinc. De plus, nous avons identifié 81
regroupements de gènes C2H2-ZNf qui correspondent à 70% de tous les gènes C2H2-
ZNf. Presque 90% des gènes C2H2-ZNF appartenant aux sous-familles KRAB et SCAN
sont trouvés sous forme de regroupements. Pour mieux comprendre l’évolution des gènes
C2H2-ZNF, nous avons par la sUite assemblé un répertoire complet de tous les
regroupements de gènes C2H2-ZNF humains ainsi que de leurs contre-parties dans les
régions synténiques des génomes de chimpanzé, de souris, de rat et de chien. Une analyse
systématique de ce répertoire chez ces mammifères a révélé qu’il existe une variation dans
le nombre de regroupements et de gènes faisant partie de ces regroupements parmi les
primates, les rongeurs et les canins. Cette variation suggère que ces gènes ont évolué de
façon différentielle chez les mammifères. Des études phylogénétiques de plusieurs
regroupements de gènes C2H2-ZNf choisis indiquent qu’outre une duplication‘J’
Page 6
différentielle, la perte de gènes dans certaines espèces a condujt à des répertoire différents
de gènes C2H2-ZNF chez les mammifères. En plus des variations spécifiques aux espèces
dans le nombre de gènes, nous avons également mis en évidence une variation chez des
orthologues dans le nombre de motifs de doigt de zinc et la présence de domaines
effecteurs, ces derniers étant souvent perdus par dégénération. En conclusion, sur la base
principale de ces résultats et de l’étude de la structure exon-intron des gènes C2H2-ZNF,
nous proposons un nouveau modèle pour lévolution de leurs sous-familles selon lequel les
sous-familles les plus anciennes seraient dans l’ordre SCAN> SCAN-KRAB > KRAB.
iv
Page 7
Abstract
The C2H2/Kruppel zinc finger genes (C2H2-ZNF) encode the largest ciass of transcription
factors in hurnans. These genes constitute one of the largest gene families in mammals and
are often found in ciusters. Using an extensive similarity search on the hurnan genorne to
identify ail C2H2-ZNF genes, we assembled a comprehensive repertoire of 718 human
C2H2-ZNF genes. The genes were grouped into subfamilies based on the N-terminal
effector domains they were associated with. We found that the KRAB-domain encoding
subfarnily constitutes 45% of the total C2H2-ZNF genes and hence is the largest
subfamiiy of zinc finger genes. In addition to this, we also identified 8 1 C2H2-ZNF clusters
which constitute 70% of the total genes. Almost 90% of the C2H2-ZNF belonging to the
KRAB and SCAN subfamilies were found in ciusters. We then assembled a comprehensive
repertoire of ail the hurnan C2H2-ZNF clusters and their syntenic counterparts in
chimpanzee, mouse, rat and dog genomes. A systernatic analysis of ah the syntenic clusters
reveaÏed a variation in the numbers of clusters and the genes within clusters among
primates, rodents and canines indicating differential pattems of evolution in mammals.
Evolutionary analysis of few selected C2H2-ZNf syntenic clusters in the five mammals
studied suggested that not only differential duplication, but also gene ioss has led to
different repertoires in mammahian genomes. In addition to lineage- and species-specific
variation in the number of genes, we aiso find a variation among orthologs in the number of
zinc finger motifs and in the presence of the effector domains, the later being often lost by
sequence degeneration. finally, based on the above resuits and on the analysis of the exon
intron structure of the various C2H2-ZNF genes, we propose a model for the evolution ofy
Page 8
their subfarnilies suggesting that the more ancient subfarnilies are in sequential order
SCAN> SCAN-KRAB > KRAB.
Keywords: C2H2/Kruppel, zinc finger, gene farnily, tandem repeats, gene duplication,
gene loss, evolution.
vi
Page 9
List of abbreviations
DNA: Deoxyribonucleic Acid
RNA: Ribonucleic Acid
BIB: Broad-Cornplex, Tramtrack and Bric-a-bric
POZ: Pox virus and Zinc finger
KRAB: Kmppel Associated Box
SCAN: SRE-ZBP, CTfin5l, AW-l andNumberl8 cDNA
KRI motif: KRAB Interior motif
1g: Immunoglobulin
ZNF45: Zinc finger 45 (protein or gene)
ZNF91: Zinc finger 91 (protein or gene)
BLAST: Basic Local Alignment Search Tool
MUSCLE: Multiple Sequence Comparison by Log-Expectation
OR: Olfactory Receptor
VH and VL domains: Heavy & Light chains ofthe Variable domain oflmmunoglobulin
molecule
KRAB C2H2-ZNF: C2H2-Zinc finger proteins associated with a KRAB domain
SCAN C2H2-ZNF: C2H2-Zinc finger proteins associated with a SCAN dornain
BIB C2H2-ZNF: C2H2-Zinc finger proteins associated with a BTB domain
KAP-1: KRAB associated protein 1
TIF1fl: Transcription Intermediaiy Factor I 3
xiv
Page 10
List of definitions
Homology: This is a concept that signifies common ancestly.
Orthologs: Genes in different species, which are similar to each other and originated from
a common ancestor, regardless oftheir functions through a speciation event.
Paralogs: Genes that are derived from a duplication event, in the sarne species or different
species. They may or may not have the same function.
Gene duplication: Duplication ofa region ofDNA that contains a gene; it may occur as
an en-or in homologous recombination, a retrotransposition event, or duplication of an
entire chromosome.
Phylogenetic tree: This is also called an evolutionary tree, and shows the evolutionary
interrelationships arnong various species or other entities that are believed to have a
common ancestor.
Synteny: This describes a common order of genes, especially between related species.
xv
Page 11
Acknowledgements
Questions and Answers are what life at the university seems to be about. Whiietiying to answer the questions about zinc fingers during my thesis, I also seem to haveleamt a lot about myseif These three years at UdeM have been a wonderffil leamingexperience both academically and personally.
First and foremost, I would like to thank rny thesis supervisor, Muriel Aubiy for acceptingand offering me the chance to be her student and work on zinc fingers and for ah theguidance and encouragement. For teaching me that what you leam during the wholeprocess of research is as important as the end resuit. For supporting me when I wasstruggiing with rny courses in French. For ail the days and nights of constant guidance shegave me for the thesis and for ahi the weekends at North Hatley. for giving me theopportunity to go to the SMBE 07, which by far has been the most exciting experience ofmy life. Dr.Muriel, thank you very rnuch foi- eveiything. This experience has made me theconfident person I am today.
Gertraud Burger, my co-supervisor for lier vahuabie guidance and suggestions. Foi- givingme the opportunity to interact with everyone from the Bioinformatics group and supportingme to let me continue in the Masters program.
Franz.B.Lang, Nicolas Lartihlot, Herve Philippe, Henner Brinkrnann and Amy Hauth for ailthe helpful guidance, discussions and constructive comments. Ahian Sun for the assistancewith the hardware and software problems I had.
My labrnates, Patricia, Deiphine, Xavier, Imene, Phuong and Hadrian for ail the help, forbeing so nice and ahways making me feel welcome in the iab.
I would like to thank my friends Uma, Reena and Ekta for supporting me during thedifficuht times I had. Karthik for being my computer guru. My girls Lakshmi, Gayatri,Sujata, Shivani and Ramaa for taking care of me and putting up with me during the difficuittimes of my thesis. Siva for helping me out at the university every tirne I had a probiem.Nagu who always let me take my frustrations and bad rnoods on him and for aiways beingthere to talk. Preethi and Kavitha for just being rny friends.
Last but not the least; this entire experience would be at most an unftilfihled dream were itnot for my loving family. I would like to express my gratitude to rny parents, Dr.NagenderSwamy and Vijaya Lakshmi for supporting my dreams and aspirations, for ietting me takemy own decisions. make mistakes, leam and grow. My sister Vamsee Priya, my brotherCharan for always being there for me and aiways taking care of me no matter what and rnybrother-in-law Sanjay.
xvi
Page 12
Table of contents
Identification of the Jury ii
Résumé iii
Abstracty
Table of contents vii
List of figures ix
List of Supplementary f igures xi
List of Tables xii
List of Supplementaiy Tables xiii
List ofabbreviations xiv
Acknowledgernents xvi
Chapter 1 iNTRODUCTION
1.1 Transcription factors 2
1.2 The C2H2 zinc finger gene farnily 6
1.2.2 The tandemly organized C2H2 zinc finger motif 7
1.2.3 The N-terminal regulatory dornain of C2H2 zinc finger proteins 9
1.3.Gene farnilies and Gene duplication 15
1.3.1 GeneFamilies 15
1.3.2 Gene Duplication and Gene Loss: Two important evolutionary mechanisms
guiding the evolution of gene families in mammals 1$
1 .4 Infening gene duplication and gene Ioss 25
1 .5 Previous Studies addressing zinc finger gene evoltition 2$vii
Page 13
1.6 Hypothesis and Objective.30
Chapter 2. ARTICLE 32
Evolution ofC2H2-zinc finger genes in mammals: Species-specific duplication and loss at
the level ofclusters, genes and their frmnctional dornains 33
Chapter 3. DISCUSSION 145
3.1 The C2H2-ZNf genes in the human genome 147
3.2 Variation in the numbers ofC2H2-ZNF genes in mammalian clusters 148
3.3 Evolution of C2H2-ZNF genes in mammals through differential expansion and loss
150
3.4 Evolution ofthe C2H2-ZNf genes through duplication or loss of zinc finger and N
terminal effector motifs 152
3.5 Birth and Death model ofevolution 153
3.7 A few concems to the study 156
3.8 Merits ofthe study 159
3.9 Perspectives 160
REFERENCES 161
viii
Page 14
List of Figures
INTRODUCTION and DISCUSSION
Figure 1: The basic structural unit ofa C2H2 zinc finger protein 8
Figure 2: The Regulatoiy domains associated with C2H2 Zinc finger proteins 14
Figure 3: Darwin’s evolutionary tree 20
figure 4: Schernatic representation ofspeciation and duplication 22
Figure 5: Schematic representation of different evoÏutionaiy processes shaping the gene
farnilies in different species 24
Figure 6: Inferring gene duplication and loss events from a gene tree in comparison with
the species tree 27
Figure 7: Birth-and-death model ofevolution 154
Figure 8: Plot of the amino acid sequence lengths of ail the C2H2-ZNF in the human
genome 158
ARTICLE
Figure 1: Flowchart of the analysis procedure of C2H2-ZNF genes and clusters 69
Figure 2: Distribution of ail the singletons and clustered genes from the various human
C2H2-ZNF sub-farnilies and gene composition ofthe C2H2-ZNf clusters 70
Figure 3: Differential expansion and loss of C2H2-ZNF clusters in five mammalian
genomes 72
Figure 4: Evolutionary scenarios in the phylogenetic tree 74
ix
Page 15
Figure 5: Phylogenetic analysis ofC2H2-ZNf genes in cluster 19.12 ofhuman and its
syntenic counterparts in other mammals 76
Figure 6: Physical maps showing the organization of the hurnan C2H2-ZNF from cluster
19.12 localized on 19q13.4 and its syntenically homologous counterparts in other mammals
7$
Figure 7: Variation in the numbers of zinc finger motifs in mammals and in the presence of
consewed N-terminal dornains in orthologs 80
Figure 8: Model for the evolution ofthe SCAN, SCAN-KRAB and KRAB C2H2-ZNF
subfarnilies 83
X
Page 16
List of Supplementary Figures
ARTICLE
Supplernentaiy Figure 1: Distribution of intergenic distances between 71$ C2H2-ZNF in
the human genome 87
Supplernentary Figure 2: Comparison of the number ofC2H2-ZNF genes in the 40 human
clusters containing at least 3 C2H2-ZNF and their syntenic counterparts in four other
mammals 8$
xi
Page 17
List of Tables
INTRODUCTION
Table 1: Different types ofDNA binding domains 4
xii
Page 18
List of Supplementary Tables
ARTICLE
Supplementaiy Table Si: Comprehensive catalogue of the 718 C2H2-ZNf genes in the
human genome 91
Supplernentaiy Table $2: Comprehensive surnmary ofthe organization of ail C2H2-ZNF
found as singletons or in clusters on each human chromosomes and classified with respect
to the various C2H2-ZNF sub-farnilies 112
Supplementary Table $3: Gene organization of the 81 hurnan C2H2-ZNF clusters 113
Supplementary Tabie S4: Comprehensive catalogue of the C2H2-ZNF genes from the 81
human clusters and their syntenic counterparts from other mammalian genomes
(chimpanzee, mouse, rat and dog) 1 15
xiii
Page 19
Chapter L INTRODUCTION
Page 20
1.1 Transcription Factors
A veiy important problem in biology is trying to understand the mechanisms by
which particular genes are expressed in a temporal or a tissue-specific manner. The process
through which a DNA sequence is copied by an RNA polymerase enzymatically to produce
compÏementary RNA is called Transcription.
The transcription process in prokaryotes and eukaryotes differs in the fact that an
RNA polyrnerase alone can initiate transcription in prokaryotes. In contrast, eukaryotes
have a much more complex transcriptional regulatory mechanism. In addition to the RNA
polymerase, eukaryotic genes need an initial assernbly of transcription factors at the
promoter (Pabo and Sauer 1992).
Transcription factors are proteins involved in the regulation of gene expression by
binding to the promoter elernents upstream of genes. They are composed mainly of two
functional regions 1) a DNA-binding dornain and 2) an Effector domain.
The DNA-binding dornain consists of amino acids that recognize specific DNA
bases generally near the start of transcription. Based on its structure, the DNA-binding
domain is classified into different types as detailed in Table I.
1. Zinc finger
2. Helix-tum-helix
3. Leucine zipper domain
4. Winged helix
5. ETS domain
2
Page 21
6. Helix-loop-helix
7. Immunoglobulin fold
In addition to a DNA-binding dornain, transcription factors also contain an effector domain.
This domain often interacts with proteins to either inhibit or activate transcription.
Transcription factors can thus act as transcriptional activators or repressors that control
gene expression by acting directly on the RNA-polymerase-containing complex bound at
proxirnity of the transcription initiation sites and/or on proteins involved in the assembly of
chromatin, the complex of DNA and proteins that make up chromosomes (Roberts 2000).
Transcription factors bring about these changes either by themselves or indirectly by
recruiting co-factors that are called co-repressors or co-activators (Roberts 2000) depending
on their effect on transcription. Co-repressors or co-activators do not bind DNA directly,
but are recruited to the gene by the effector domain of transcription factors.
9
Page 22
ETS domain: This dornain is 85-90 arnino acidslong. It was discovered in the ETS oncogene.Three aipha-helices and a 4-strand beta sheetfold into a domain. The third helix is therecognition helix.Example:The Elki-E74DNA complex, where Elk-1 is amember of a large group of eukaryotictranscription factors with ETS domain.Alpha helices are in blue and beta-strands inyellowHellx-Loop-Hellx: This motif has two alphahelices connected by a loop. Generallytranscription •factors with this ioop are dimeric.A smailer helix allows dimerization while theother larger helix facilitates DNA binding.Example:Iwo alpha helices (in Red) connected by a loop(in Green) to form a domain.
Immunoglobulin fold: This is also called an ailf3 protein fold, which has a 2-layer sandwich of7 antiparallel f3-strands arranged in two f3-sheets.
Example:Hurnan Tenascin with its immunoglobulin fold,fibronectin type Iii, coloured from Blue (Nterminus) to red (C-terminus).
Winged helix: This motif has 110 aminoacids. Each dornain bas four alpha-helices andtwo beta-sheet strands.
Example: Alpha helices are in purple and betastrands are in yellow.
4
5
Page 23
Table 1: Different types ofDNA bïnding domaïns
DNA-bindinZinc lingerA zinc finger has two antiparallel 13 strands andan a helix. Two cysteines and two histidinesinteract with a zinc ion to form a finger likestructure.
Example: The two cysteines on the beta-strandin green can be seen interacting with the twohistidines in orange on the aipha-helix. Theinteracting zinc ion is shown in red in the center.
llellx-TurnHeiïx: This is a major structuralunit capable of binding DNA.It has two aipha-helices which are joined by ashort stretch ofamino acids (turn).
Example: Helix-turn-helix (green and yellow)of bacteriophage lambda, which binds to DNA(blue and cyan).
Leucine zipper domain: it consists of a shortalpha helix with a leucine residue at everyseventh position.Example:The Ap-1 dimer formed by Fos and Junhomologous proteins. The leucine zipper motifbas two Œ-helices which look like a zipper withthe leucine residues (in Green) lining the zipper.
4
Page 24
1.2 The C2H2 zinc finger gene famïly
0f the rnany large families encoding transcription factors that have been identifled,
zinc finger genes of the C2H2 type constitute the largest one (Schuh, Aicher et al. 1986;
Bellefroid, Lecocq et al. 1989). The C2H2 motif encoded in these genes typically includes
two cysteines and two histidines coordinating a zinc ion. This motif was first identified in
the TFIIIA of Xenopus leavis and later in the Krtippel drosophila segmentation gene
(Miller, McLachlan et al. 1985; Schuh, Aicher et al. 1986). Thus, the C2H2 zinc finger
genes are often refeired to as TFIIIA/Kmppel type of zinc finger genes.
Known to constitute one of the ten largest gene families LPfam databasel, these
zinc finger genes are found not oniy in eukaiyotes but also in prokaiyotes. Members of
C2H2 zinc finger family have now been identified in ail kingdoms of life i.e. eubacteria,
archaebacteria, protists, ftingi, animais and plants (Bouhouche, Syvanen et al. 2000;
Moreira and Rodriguez-Valera 2000). Throughout evolution, there bas been a massive
expansion in the numbers of the C2H2 zinc finger genes (Lander, Linton et al. 2001;
Venter, Adams et al. 2001). Noticeably, human beings are predicted to have more than 700
zinc finger genes often found in a ciustered organization (Bellefroid, Lecocq et al. 1989;
Looman, Abrink et ai. 2002).
While most of the C2H2 zinc finger genes characterized have been described as genes
encoding transcription factors which bind to DNA, some are also known to encode RNA
binding proteins that may thus participate in RNA metabolisrn or maturation (Theunissen,
Rudt et ai. 1992; Grondin, Bazinet et al. 1996).
6
Page 25
1.2.1 Structure of the proteins encoded by C2H2 ZNf genes
The C2H2 zinc finger transcription factors generaÏly consist of two essential regions
1) the C2H2 zinc finger region containing in most instances several zinc finger motifs
organized in tandem and 2) the N-terminal regulatoiy domain
1.2.2 The tandemly organized C2H2 zinc finger motif
The C2H2 zinc finger proteins are composed of zinc finger motifs which form the
zinc finger region of the protein. Each motif is a highly conserved sequence of 28 amino
acids (CX24CX3FX5LX2HX34HTGEKPYX, where X is any amino acid). Each motif is
separated from the following one by a highly conserved linker region (TGEKPYX, where
X is any arnino acid) (Miller, McLachlan et al. 1985; Wolfe, Nekiudova et al. 2000;
Loornan, Abrink et al. 2002). The basic conserved C2H2 zinc finger structural unit includes
two cysteines and two histidines which interact with a zinc ion and are essential for the
proper folding ofthe motif into a finger like structure (See f igure 1) (Looman, Abrink et al.
2002). C2H2 zinc finger proteins are composed of one or more tandemly organized zinc
finger motifs. The number of zinc finger motifs in the protein varies from one to more than
30 in a few cases (Ruiz i Altaba, Peny-O’Keefe et al. 1987).
7
Page 26
()
o Jo o
Ç) o‘© (v
k)
A c
________
B
o (F)
Jou n
orn- -ÇÇ) cm—
tHC) tH)’
®©( ®®
Figure 1: The basic structural unit of a C2112 zinc linger protein.
(A) The C2H2 zinc finger motif is present in tandem in the protein. Three zinc linger motifs arc
connected by a conserved Iinker region (TGEKPY). The two cysteines and two histidines which
interact with a zinc ion inc]uding die other conserved residues are shown with their single letter
codes. The residues involved in DNA binding are shown in grey.
(B) The three-dimensional structure of a zinc finger binding domain. Two anti-parallel f3-strands
and one Œ-heÏix interact with a zinc ion as shown in the figure.
8
Page 27
Nuclear magnetic resonance spectroscopy (NMR) was used to determine the three
dimensional structure of the C2H2 zinc finger motif (Lee, Gippert et al. 1989; Omichinski,
Clore et al. 1990). Two beta strands and one alpha helix form an independently folded
domain with a compact globular structure (See figure 1). The zinc ion, that is tetrahedrally
coordinated betwcen two invariant pairs of cysteines and histidines, connects the 3-sheet
and the a-helix. Four amino acids on the surface of the a-helix in the zinc finger motif
make base specific contacts with three to four bases in the major groove of the DNA helix
(frankel, Berg et al. 1987; Panaga, Horvath et al. 1988; Omichinski, Clore et al. 1992;
Krishna, Majumdar et al. 2003). Aithougli the zinc finger domain has been described as
nucleic acid binding domain, not ail the zinc finger motifs are involved in DNA or RNA
binding. For example, in ZBRK1 zinc finger protein, only the first few fingers are involved
DNA binding and ail the others in protein-protein interactions (Zheng, Pan et al. 2000).
1.2.3 The N-terminal regulatory domain of C2112 zinc fingerproteins
In addition to the zinc finger region, C2H2 zinc finger proteins are also associated
with an N-terminal regulatoiy domain (f igure 2), which regulates subcellular localization
and the gene expression by acting as either a repressor or an activator by itself or by
associating with other factors (Collins, Stone et al. 2001).
9
Page 28
The regulatoiy domains associated with C2H2 Zinc finger proteins are
j. BTB/POZ domain
ii. KRAB domain
iii. SCAN dornain
j. The BTB domain
The BTB domain (Broad-Cornplex, Tramtrack and Bric-a-bric) also known as
the POZ domain (Pox virus and Zinc finger) is a 120 arnino acid conserved dornain found
to be associated with both DNA and actin-binding proteins. The 3TB domain is involved in
protein-protein interactions (Collins, Stone et al. 2001). As a part of DNA binding proteins,
the BTB/POZ domain is known to be a dirnerization domain which, in a few cases also
recmits co-repressors (such as N-CoR, STN3A or SMRT) and acts a repression domain.
When found in association with C2H2 zinc finger transcription factors, the BTB domain is
generally located N-terminal to the zinc finger region. Thus, by mediating oligornerization
and in some instances interaction with co-factors, the BTB domain can lead to chromatin
remodeling and change in gene expression (Melnick, Carlile et al. 2002).
ii. The Kruppel-Associated Box (KRAB) domaïn
Another well known example of an N-terminal regulatory domain associated
with C2H2 zinc finger proteins is the Kruppel-Associated Box or the KRAB domain
(Bellefroid, Poncelet et al. 1991; Rosati, Marino et al. 1991). KRAB domains are almost
10
Page 29
aiways associated with C2H2 zinc finger proteins. An exception to this scenario is the
SSX family ofproteins. These proteins are associated with a “SSX KRAB dornain” which
is distantly related to the KRAB domain from C2H2 zinc finger proteins ( 49% similar)
but are flot associated with zinc fingers (Collins, Stone et al. 2001; Urrutia 2003).
Unlike the C2H2 zinc finger proteins which are present in organisms ranging
from bacteria to humans, the KRAB dornain as seen in C2H2 zinc finger proteins is present
only in vertebrates, more specifically in tetrapods (Looman, Abrink et al. 2002). However,
a recent study identified a sea urchin homolog to the mammalian Meistez protein which
includes a tandem array of C2H2 zinc finger motifs, a SET dornain and a sequence with
some sirnilarity to the “SSX-KRAB domain” (Birtie and Ponting 2006). This suggests the
presence of the KRAB domain in the common ancestor of echinoderms and vertebrates. A
further study of these proteins in ftingi and plants identified a 26 amino acid motif called
the KRI motif which was found to be similar to the aipha-helical regions of KRAB and
present in ail eukaryotes. This indicated that the KRI motif was present in the last common
ancestor of animais, plants and fungi and is the progenitor of the KRAB dornain.
The KRAB domain is most abundant in mammals (Lander, Linton et al. 2001;
Venter, Adams et al. 2001; Waterston, Lindblad-Toh et al. 2002). For example, about one
third of the mouse C2H2-ZNF are associated with KRAB (Benn, Antoine et al. 1991;
Waterston, Lindblad-Toh et al. 2002). The KRAB domain is mostly associated with more
than 5 C2H2 zinc finger motifs in a protein, justifying the name “Multifingered protein”
(Bellefroid, Poncelet et al. 1991). Many genes encoding the KRAB containing proteins are
found in a clustered organization as opposed to the ones found as singletons (Bellefroid,
11
Page 30
Marine et al. 1993; Shannon, Kim et al. 199$; Chung, Schafer et al. 2002; Rousseau
Merck, Koczan et al. 2002; Hamilton, Huntley et al. 2003).
The KRAB domain is 75 amino acids long and is divided into two boxes, Box A
(-38 amino acids) and Box B (32 amino acids) (Looman, Abrink et al. 2002; Urrutia
2003). A variant of the B box, called b box also exists. Some C2H2 zinc finger proteins
have another box following the A box, called the C box (21 amino acids). Each of these
boxes is encoded by different exons and separated by introns of vaiying lengths (Loornan,
Heilman et al. 2004). The KRAB domain functions as a potent repressor of transcription
(Margolin, Friedman et al. 1994). The KRAB A box plays an important role in repression
by binding to co-repressors, while the KRAB B box doesn’t have transcriptional activity
but is known to enhance the repression activity of the A box (Witzgall, O’Leary et al.
1994). The process of transcription repression is mediated by KAP-1, also called
transcription intenriediary factor 13 (Tlflt3) which is a co-repressor that interacts with
KRAB A (Friedman, Fredericks et al. 1996; Germain-Desprez, Bazinet et aI. 2003). The
KRAB domain of C2H2 zinc finger proteins recruits the KAP I co-repressor to DNA,
which results in the formation of a heterochromatin like complex and leads to gene
silencing (Pengue, Calabro et al. 1994; Kim, Chen et al. 1996; Moosrnann, Georgiev et al.
1996; Pengue and Lania 1996).
iii. The SCAN domain
The SCAN domain like the KRAB domain is another vertebrate specific domain
only associated with C2H2 zinc finger proteins (Williams, Khachigian et al. 1995; Looman,
12
Page 31
Abrink et al. 2002). The SCAN domain was estimated to be associated with 10% of the
total C2H2-ZNF present in the human genome (Collins, Stone et al. 2001; Edeistein and
Collins 2005). Also known as the LeR domain because of its leucine rich primaiy
structure, the SCAN domain is named after the four proteins it was initially identifled
(SRE-ZBP, CTfin5l, AW-1 and Numberl8 cDNA) (Urrutia 2003). In addition to being
associated with the C2H2 zinc finger proteins, the SCAN domain containing proteins are
sometimes associated with a KRAB domain having the organization SCAN-KRAB
(C2H2) or in very few cases KRAB-SCAN-KRAB-(C2H2) (Edeistein and Collins 2005;
Huntley, Baggott et al. 2006).
Structural studies on the SCAN domain indicate that it has 84 arnino acids and is
found to have three to five a-helices which are delineated by one or more proline residues.
Proline residues are also present before and after the SCAN domain (Stone, Maki et al.
2002). The SCAN domain is a homo and hetero-dimerization domain mediating protein
protein interactions by self association and formation of heterodimers between SCAN
family members (Sander, Haas et al. 2000; Schumacher, Wang et al. 2000). The importance
of the dimerization for the transcriptional activity of SCAN-C2H2 zinc finger protein lias
flot been clearly established.
13
Page 32
Regiilatory Dornain Spacer Zinc linger region
_
- flUBA
VTFED5AVYFSQEEWGLLDPAQRNLYRDvLENY
RNLVSL
—FJT------ -
J.
KRAB b
bHQLFJOEDX I sQLEREEKLWMMIxATQRGDS S>’k !.
nU
SCANPDPEIFRQRFRQFCYQETPGPREALSR LRELCHQ
WLRPEVHTKEQILEL LVLEQF LTI LPKELQAWVQ
EIfflPESGEEAVTLLEDLERELDEPGQQV
. LQNPSIWTGLLCKANQMRLAGTLCDVVIMVDSQE
FEFTILiCTSK14FEILFRRNSQHiTLDFLSPK., ., . TFQQILEYAYTATLQAKAEDLDDLLYAAEILEIE
Y LEEQC LKM L
B
Figure 2: The Regulatory domains associated with C2H2 Zinc linger proteins.
(A) The different combinations of dornains associated with zinc finger proteins are shown.
Zinc finger proteins generally have three main regions: The Regulatory domain, the Spacer
and the Zinc finger region. (B) The consensus sequence of the domains KRAB (A, B, b and
C boxes), SCAN and BTB. The residues essential for binding KAPI and thus for repression
are shown in KRAB A underlined in red.
14
Page 33
1.3 Gene familles and Gene duplication
1.3.1 Gene Families
A gene family colTesponds to a set of genes that are grouped based on their shared
homology, biologicai or biochemical activity, sequence motifs or similarities in stntcture.
Because they consist of a large number of genes, gene families are the most informative
systems to study evolutionary dynamics of genes. Nuclear genomes have many multigene
families and their studies provide dues to the evolutionaiy forces that have shaped these
genomes (Ohta 2000; Thomton and DeSalle 2000). Mammalian genornes in particular have
large numbers of genes organized in gene families (Demuth, Bie et al. 2006). Some gene
families have uniforrn copy numbers of genes in ail species (Thomton and DeSalle 2000),
while there are gene families like the Immunoglobulin gene family, the Olfactory receptor
gene family and the C2H2 zinc finger gene family which have a large variation in the
number of genes across different species . The variation in the gene numbers of these
families and diversity in ftinction, suggests that gene duplication and/or gene ioss have
played an important role in shaping different mammalian genomes.
I. The Olfactory Receptor gene family
Olfactoiy receptor (OR) genes form the largest known multigene family in
mammalian genomes (Glusman, Bahar et al. 2000) and code for membrane receptors that
are responsible for olfaction, the sense ofsmell. OR genes are present in various vertebrates
ranging from lampreys to humans. The OR proteins belong to the G-protein coupÏed
15
Page 34
receptor family which have seven transmembrane dornains. OR genes are divided into 2
classes based on their protein sequence similarity (Glusman, Bahar et al. 2000; fuchs,
Glusman et al. 2001). 0f the two classes, Class I genes first identified in fish but also found
in mammals are specialized in water-soluble odorants and the Class II genes specialized for
airbome odorants are specific to tetrapods.
The number of the OR genes is quite varied in different genomes. Rodents have
nearly twice as many as the number present in human, chimpanzee or dog (Niimura and
Nei 2005). The Human genome has more than half of the -900 OR genes as pseudogenes.
In contrast, the mouse genome bas l300 OR genes of which only one-fourth are
pseudogenes. $tudies on the human, chimpanzee and mouse OR gene repertoires indicate
that there are species-specific expansion and pseudogenization signifying different
selection pressures in humans, chimpanzees and mouse owing to their different sensoiy
requirements (Sharon, Glusman et al. 1999; Glusman, Yanai et al. 2001; Lapidot, Pilpel et
al. 2001; Gilad, Man et al. 2005; Niimura and Nei 2005). Evolutionary analysis of the
human, mouse and chimpanzee datasets indicate the presence of clustered organization
which is generally well conserved in these genornes. Analyses of the clusters indicate that
there are tandem arrays of the OR genes which appear to have arisen by tandem duplication
and several chromosornal rearrangements. The difference in the numbers of OR genes in
hurnan and mouse has been attributed to gene duplication and loss events (Sharon,
Glusman et al. 1999; Niimura and Nei 2005).
16
Page 35
ii. The Immunoglobtilin gene family
The immunoglobulin gene family represents an example where its two subfamilies,
the immunoglobulin heavy variable region sub-farnily and immunoglobulin light chain
variable region subfamily, have co-evolved by valying in gene number and extent of
diversity in different species ($itnikova and Sti 1998). An immunoglobulin molecule is a
tetramer with two identical heavy chains and two identical light chains which forrn a Y
shaped structure. Each of these chains has a variable (V) and constant (C) domain. The VH
and VL domains have the complementarity determining regions, called the CDRs which
form the sites of interaction with antigens. Analyses of these two sub-farnilies of genes
from various species of amniotes identified that these gene families have diversified
throughout the course of evolution (Sitnikova and Su 1998). Different coordinated loss and
duplication events have led to different species-specific gene repertoires.
iii. The C2112 zinc finger gene family
In addition to the above mentioned gene families, the C2H2 Zinc finger gene farnily
is another example of a large multigene family with varying number of genes in different
species. Over the course of evolution, this gene famlly bas expanded drastically in
mammalian genornes (e.g. — 400 in mouse and 700 in human) (Venter, Adams et al.
2001; Waterston, Lindblad-Toh et al. 2002). Several studies involving these genes in the
human genome have indicated that tandem duplication events are responsible for the
17
Page 36
clustered organization of this family (Shannon, Kim et al. 199$; Elemento and Gascuel
2002; Elemento, Gascuel et al. 2002; Tang, Waterman et ai. 2002; Bertrand and Gascuel
2005; Huntley, Baggott et al. 2006). A few instances of evoiutionary studies ofthese genes,
within the human genome and among a few mamnialian genomes document cases of
species-specific duplication (Dehal, Predki et ai. 2001; Shannon, Hamilton et ai. 2003;
Huntley, Baggott et al. 2006).
Ail these examples of gene families suggest variation in number among
different species involving different duplication and ioss events. The gene family size could
vaiy based on the ftinctionai relevance of the gene farniiy in the organism. These examples
also indicate the importance of studying the gene families to give dues on the evolutionaiy
mechanisms which led to different sizes of gene families.
1.3.2 Gene Duplication ami Gene Loss: Two important evolutionary
mechanisms guiding the evolution of gene famïlîes in mammals
Considering the extremely large numbers of genes constituting gene families
(Demuth, Bie et al. 2006), it is interesting to study their organization and the evolutionaiy
mechanisms that created them. A study integrating the information from spatial
organization of the genes with the phylogenetic reiationships between the genes combined
with evolutionaiy information of the species would help provide dues about the evolution
ofthe gene families.
18
Page 37
In the context of using phylogenetic studies to analyze the evolutionary
reiationships between genes in gene families, one significant term that features in ail studies
is “Romoiogy”. Homology forms the centrai and basic concept of comparative genomics
but is aiso a terni that is often misrepresented and misinterpreted. The terni homology was
introduced by Richard Owen in 1848, where lie defined homology as “the sanie organ
under eveiy variety of form and fttnction”(Francis Darwin 1903) . The importance of
structure and fiinction is ernphasized more in this definition. In an attempt to give an
evoiutionary explanation to hornoiogous structures, Darwin defined homology as “A
structure is sirniiar among reiated organisms because those organisms have ail descended
from a common ancestor that had an equivaient trait” (Darwin 1837) (Figure 3).
When put in the context of molecular sequence comparison, in today’s times,
homoiogy refers in an abstract way to a reiationship which implies a possible common
ancestry and shouid be differentiated from identity2 or similarity3 of sequences. However,
to be substantiated, homology must be confirmed by appropriate phylogenetic studies. It is
important to note that homology does flot say anything about functionai simiiarity
(Thomton and DeSalle 2000))(Fitch 2000).
131Homology: A hypothesis that signifies comnion ancestry between sequences (nucleotide or amino acid)which is prirnarily based on sequence similarity.
2ldentity: The extent to which two (nucleotide or amino acid) sequences are invariant.
Similarity: The extent to which nucleotide or protein sequences are related. The extent ofsirnilaritybetween two sequences can be based on percent sequence identity and!or conservation
19
Page 38
Figure 3: Darwin’s evolutionary tree.
The figure is Charles Darwin’s first ever sketch of an evolutionary tree from bis book titled
“First Notebook on Transmutation of Species (1837)”.
20
Page 39
There are three major types of homology in a phylogenetic context which are
orthology, paralogy and xenology. Orthotogy as described by f itch in 1970 is the
relationship between two genes in two different species which originated from a common
ancestor. Two homologous sequences are considered to be “orthologous” if a speciation
event separates them. In contrast, Paralogy signifies the relationship between two genes
which have been formed by a gene duplication event. XenoÏogy, another type of homology
relationship describes the relationship between two genes which have been transferred
between two species by horizontal gene transfer.
Studying the homologous relationships of genes within and between various
genomes and differentiating between orthologs and paralogs is a central aspect of
comparative genomics. figure 4 shows a very simple explanation of the difference between
orthologs and paralogs. The genes Ai, Bi and 32 have evolved from an ancestral gene by
speciation followed by a duplication event in species B. Gene Al from species A is an
ortholog of gene Bi and gene B2 in species B illustrating that one gene in a particular
species may have more than one ortholog in the other. Gene Bi and gene B2 in species B,
which were formed by gene duplication, are paralogs to each other.
21
Page 40
Speciahon
DRptkWiJrn
Figure 4: Schematic representation of speciation and duplication
Genes Ai, Bi and B2 are formed from an ancestral gene by a speciation and duplication
event. Gene Al from species A has two orthologs in species B, genes 31 and B2. Bi and
32 are paralogs.
22
Page 41
figure 5 depicts different evolutionary scenarios that one encounters while
studying the evolution of genes in gene families are depicted to explain the relationships
between genes in ternis of orthologs, paralogs and gene Ioss. An ancestral gene undergoes
duplication in species O to give the genes, A, B and C. This is followed by a speciation
event with genes Ai, Bi and Clin Species i and genes A2 and B2 in Species 2. The gene
Al is an ortholog of A2 and Bi is an ortholog of 32. The gene Cl does not have a
corresponding ortholog, as the Species 2 lost the gene after speciation. The genes AI, Bi
and Cl are paralogs within species 1 and, A2 and B2 are paraiogs within species 2.
Furthermore, as explicitly pointed out recently by fitch (Fitch 2000) and as often ignored,
gene Ai (species i) is also a paralog of gene B2 (species 2), gene Bi (species 1) is a
paraiog of gene A2 (species 2) and gene Cl is paralog of gene A2 and B2 (species 2). f rom
these explanations, it is clear that orthologs are homologous genes residing in different
species, while paralogs may not only refer to the homoÏogy relationship between genes
from the same species but also from different species. It is essential to understand that both
orthologs and paralogs are free to diverge and do not necessarily aiways have the same
function (Thornton and DeSalle 2000).
23
Page 42
Duplk&dsn ewm
Geiz, Ie
Figure 5: Schematic representation of different evolutionary processes shaping the
gene familles in different species.
Gene duplication, speciation and loss lead to the formation of genes Ai, Bi, Cl in species
1 and A2 and B2 in species 2.
csO
1Spci1in ewiu
24
Page 43
1.4 Inferring gene duplication and gene loss
That two genes are homologous is a hypothesis that needs to be studied and
analyzed to be able to derive the relationships between the genes to be either orthologs or
paralogs. Studying and analyzing the relationships between gene farnilies i.e. evaluation of
orthology or paralogy requires a well formulated approach. In order to be able to postulate
theories on how related genes evolved from an ancestral gene i.e. by gene duplication, gene
loss or by speciation, one needs to assess homology relationships using a well founded
phylogeny.
The first step in assessing homology is a sequence alignment of the molecular
sequences be it nucleotides or arnino acids. This gives a preliminary measure of possible
homology which can then be assessed using a phylogeny. A welI supported phylogeny
gives the evolutionaiy relationships bePveen the genes in relation to one another.
Comparison of a gene phylogeny between genes with the taxonomic relationships between
species, allows gene duplication and loss events to be assessed and roughly dated. As an
example, figure 6 shows the different scenarios of gene duplication and gene Ioss. In
figure 6A, it can be seen that a duplication event prior to the speciation event resulted in
species 2, 3 and 4 created the paralogous gene groups A, B and eventually C. In a
hypothetical situation, suppose the genes 2A, 3C and 4B are missing from the gene tree.
Assuming that the studied sequences were derived from completely sequenced genomes,
the missing genes could either be due to a loss of genes or their possible pseudogenization
in the respective species. That the gene duplication occurred prior to speciation can stili be
25
Page 44
resolved by superimposing the gene tree with the species tree. Reconciliation between the
species and the gene tree will help resolve the absence of genes 2A, 3C and 4B as can be
seen in figure 6B. This kind of srndy can hence be used to infer the evolution of genes
belonging to gene families within and among species in a phylogenetic context.
26
Page 45
lA2A3A
4A3C4C4D2B3134E lA
3A
4A3C4C4D2E3B413
Figure 6: Inferring gene duplication and loss events from a gene tree in comparison
with the species tree.
(A) A gene tree showing the phylogeny between genes belonging to species 1, 2, 3 and 4. Genes are
represented as A, B, C and D. A gene duplication represented as x occurred prior to the speciation
event leading to Species 2, 3 and 4.
(B) A species tree showing the relation between species 1, 2, 3 and 4.
(C) A hypothetical situation where the genes 2A, 3C and 4B are missing from the tree as shown in
red. Reconciliation of the phylogenetic tree from (A) with the species tree from (B) helps identify
the flot only the duplication event but also the missing genes to be able to infer loss. Adapted from
(Thornton and De$alÏe 2000)
j
4B
27
Page 46
1.5 Previous Studïes addressing zinc finger gene evolution
About 2000 of the 30,000 genes in the human genorne code for transcription
factors (Venter, Adams et al. 2001). C2H2-ZNF are the most common of ail the eukaiyotic
transcription factors present in the human genome (encoded by 700 genes). Owing to
these facts, the C2H2 zinc finger gene famiiy has been considered to be an evoiutionary
piayground for genes to develop and differentiate and hence is an interesting famiiy to
study (Looman, Abrink et al. 2002).
The studies pertaining to C2H2-ZNF have mostly been restricted to those associated
with a KRAB domain and more specifically to the human genome. A recent study
identified 423 KRAB C2H2-ZNF ioci organized into 65 ciusters on the human genome
(Huntley, Baggott et al. 2006). Evolutionary studies involving these KRAB C2H2-ZNF
genes indicated that the evolutionaiy reiatedness within and among ciusters was flot only
associated with physical proximity evoiving through tandem duplications but aiso through
distributed duplication and postduplication rearrangement events, which have lcd to the
drarnatic increase in the gene numbers of this famiiy in hurnans (Hamiiton, Huntley et ai.
2003; Hamiiton, Huntley et al. 2006; Huntley, Baggott et ai. 2006). Though present in
clusters, the KRAB C2H2-ZNF are not co-regulated and they show different pattems of
expression (Huntley, Baggott et al. 2006).
2$
Page 47
A study of one KRAB C2H2-ZNF gene cluster on hurnan chromosome 19,
suggested an evolutionary model showing the presence of certain beta-satellite repeat
structures symrnetrically ordered with the zinc finger genes in the cluster which have
coevolved with the cluster accommodating the expansion of the genes within this cluster
(Eichler, Hoffman et al. 1998).
A statistical analysis using phylogenetic models on four hurnan C2H2-ZNF clusters
on chromosome 19 indicated that positive selection is the driving force involved in the
diversification of the KRAB C2H2 zinc linger genes (Schmidt and Durrett 2004).
Not much is known about the evolutionary histories of these genes in different
mammalian genomes and very few studies have been carried out to comparatively analyze
their evolution. A preliminaiy report on species-specific expansion of these genes, resulted
from one study on a C2H2-ZNF cluster on human chromosome 19 and its syntenically
homologous cluster on mouse chromosome 7 (Shannon, Hamilton et al. 2003) . A study on
the evolution of members of the primate-specific ZNF9 1 KRAB subfamily, which are
mainly found in a chromosome 19 cluster, revealed that this gene subfamily evolved before
the spiit of humans and apes. But afier the split, these genes have continued to evolve
differentially be it through tandem duplications or segmental duplications, leading to
species-specific genes (Dehal, Predki et al. 2001; Hamilton, Huntley et al. 2006).
Inspite of several studies dealing with these genes, there has neyer been a
comprehensive study on the C2H2-ZNF genes and their evolution or their functions. In
29
Page 48
order to systematically define and analyze the extent of species-specffic duplication and
the role of gene loss in the evolution of these genes, it is important to conduct a
comprehensive study of these gene clusters in mammalian genomes to obtain dues on their
evolution and their possible implications on ffinctions specific to each species.
1.6 Hypothesis and Objective
Previous studies on zinc finger genes have provided evidence that zinc finger
genes have undergone a huge expansion in vertebrate genomes, with a specific increase in
humans. Studies have shown that these genes have been subjected to expansion through
tandem duplication and also of the existence of species-specific duplication events
(Shannon, Kim et al. 199$; Shannon, Hamilton et al. 2003; Harnilton, Huntley et al. 2006;
Huntley, Baggott et al. 2006). A contribution of gene loss in the evolution of C2H2 zinc
finger genes has been suggested but neyer tested rigorously.
The main objective of this thesis is to systematically determine to what extent zinc
finger genes are subrnitted to species-specific expansion and to assess the potential
contribution of gene loss in the evolution of this gene famiiy in mammals. To this end, we
have:
1. Assembled a curated database of ail C2H2-ZNF genes in the hurnan genome and
identify ail the C2H2-ZNF clusters in the human genome
30
Page 49
2. Searched for syntenically homologous clusters in other cornpletely sequenced
mammalian genomes, narnely chimpanzee, mouse, rat and dog genomes.
3. Perforrned a phylogenetic analysis of C2H2-ZNF genes from the syntenically
homologous clusters.
4. Perfornied a reconciliation of both phylogenetic analyses and physical maps of the
clusters with the species tree accounting for the evolutionary history of the species
in order to infer gene loss and gain.
These studies should allow us to determine the nature of evolutionary events that
shaped this large gene farnily in mammals. In particular, this study wiIl help us to better
infer orthology in the various mammals and better understand the evolution and
relationships between the different C2H2-ZNF subfarnilies.
31
Page 50
Chapter 2. ARTICLE
32
Page 51
Evolution of C2H2-zinc finger genes in mammals:
Species-specific duplication and loss at the level of
clusters, genes and their functional domains.
Hamsa Dhwani Tadepalfy, Gertraud Burger and Muriel Aubry*
Department of Biochemistry, Université de Montreal, C.P.612$,
Succ.Centre-Ville, Montreal, QC, H3C 3J7, Canada
To whom correspondence and reprints should be addressed:
Muriel Aubry, Ph.D.
Departrnent of Biochemistiy
Université de Montréal
C.P. 6128, Succ. Centre-Ville
Montréal, H3C 3J7
Canada
Key words: C2H2/Kruppel, zinc finger, gene family, tandem repeats, gene duplication,
gene loss, evolution.
33
Page 52
ABSTRACT
C2H2 zinc finger genes (C2H2-ZNF) constitute the iargest class of transcription factors in
hurnans and one of the largest gene families in mammals. Often arranged in clusters in the
genome, these genes are thought to have undergone a massive expansion in vertebrates by a
process involving tandem duplication. However, this view is based on lirnited datasets
restricted to single chromosome or a specific subfamiiy of C2H2-ZNF genes. Here, we
present the first comprehensive study of the dynamic evoiution of the C2H2-ZNF family in
mammals. We assernbled the complete repertoire of hurnan C2H2-ZNF genes (718 in
total), about 70 % of which are organized into 81 ciusters across ail chromosomes. Based
on an analysis of their N-terminal effector domains, we identified
SET- and HOMEO dornain-encoding C2H2-ZNF genes as members of two new C2H2-
ZNf subfamiiies. We searched for the syntenic counterparts of human clusters in other
mammals for which compiete gene data are avaiiable: chirnpanzee, mouse, rat and dog.
Cross-species comparisons show a large variation in the numbers of C2H2-ZNF genes
within homologous mammalian clusters stiggesting differential pattems of evolution.
Phylogenetic anaiysis of selected C2H2-ZNF clusters reveals that differences in C2H2-ZNF
gene repertoires across mammais not only originate from differentiai gene duplication but
also gene loss. Further, we find variations among orthologs in the number of zinc finger
motifs and association of the effector dornains, the later often undergoing sequence
degeneration. Based on these resuits and an anaiysis of the exon-intron organization of
genes from the large SCAN and KRAB domains-containing subfamilies, we propose a new
model for the evolution ofthese subfamilies.
34
Page 53
This manuscript includes two supplementaiy Figures and four supplementary Tables
INTRODUCTION
The human genome sequence uncovered a large number of gene families oflen
arranged in a clustered organization (Ohta 2000; Thornton and DeSalle 2000; Venter,
Adams et ai. 2001). C2H2 zinc finger (C2H2-ZNf) genes make tip 2 % of ail the human
genes and represent the second largest gene family in humans after the odorant receptor
farnily (Lander, Linton et al. 2001) (Schuh, Aicher et al. 1986; Bellefroid, Lecocq et al.
1989; Messina, Glasscock et al. 2004). The first identified members of the C2H2-ZNF
family are Xen opus TFIIIA and Drosophila Kruppel and thus genes of this family are often
called zinc finger genes of the TFIIIA or Kruppel type (Miller, McLachlan et ai. 1985;
Schuh, Aicher et al. 1986).
Most of the characterized C2H2-ZNF genes code for transcription factors which
bind DNA through their zinc finger region; others bind RNA and their exact function is yet
unknown (Theunissen, Rudt et al. 1992; Grondin, Bazinet et al. 1996). The zinc finger
region is cornposed of a basic structural unit of 28 amino acids (CX21CX3FX5LX2HX3
4HTGEKPYX, where X is any arnino acid), called the zinc finger motif, that is often
repeated in tandem. The two cysteines and two histidines in this motif interact with a zinc
ion, stabilizing the proper folding of this motif (Klug and Rhodes 1987; Lee, Gippert et al.
1989; Rhodes and Klug 1993). C2H2-ZNF proteins often contain an effector domain
aiways located N-terminal to the zinc finger region, such as the KRAB (Kntppel
Associated-Box), SCAN (SRE-ZBP, CTfin5l, AW-1 and Numberl8 cDNA) and BTB
(Broad-Complex, Tramtrack and Bric-a-bric) domains. The first two domains are
35
Page 54
vertebrate-specific (BelÎefroid, Poncelet et al. 199f; Rosati, Marino et al. 1991; Collins,
Stone et al. 2001), while BTB is also present in insects. The KRAB domain includes the
box KRAB A (—38 amino acids) involved in transcriptional repression and often a second
box, usually KRAB B (—32 amino acids) or in few cases KRAB b or KRAB C (—21 amino
acids) box (Witzgall, O’Leary et al. 1994; Looman, Abrink et al. 2002; Urrutia 2003;
Looman, Heilman et al. 2004). The KRAB A box and the second KRAB B, b or C box are
encoded by separate exons, which are alternatively spliced. The SCAN, also called the
leucine-rich (LeR) domain (— 84 amino acids) (Stone, Maki et al. 2002) mediates protein
protein interactions through dimerization (Sander, Haas et al. 2000; Schumacher, Wang et
al. 2000). The BTB dornain Q—- 120 amino acids) is a dimerization domain that also acts as a
repression dornain in some cases (Melnick, Carlile et al. 2002). In contrast to the SCAN
and KRAB domains which are only present in C2H2-ZNf proteins, the BTB domain is
also found as a part of actin-binding proteins (Coïlins, Stone et al. 2001). C2H2-ZNF
proteins are grouped into different subfamilies based on the type of N-terminal effector
dornain present.
Initial studies on the C2H2-ZNF gene farnily focused on hurnan chromosome 19,
which is particularly enriched in clusters of these genes (Bellefroid, Marine et al. 1993;
Eichler, Hoffman et al. 1998). More recent studies deait more specifically with the KRAB
subfarnily (Mark, Abrink et aI. 1999; Looman, Abrink et al. 2002; Shannon, Harnilton et al.
2003; Huntley, Baggott et al. 2006). The current view is that C2H2-ZNf genes have
undergone a massive expansion during vertebrate evolution by a process involving tandem
duplication (Dehal, Predki et al. 2001; Looman, Abrink et al. 2002; Hamilton, Huntley et
36
Page 55
al. 2003; Shannon, Harnilton et al. 2003; Harnilton, Huntley et ai. 2006; Huntley, Baggott
et al. 2006). Yet, this view may be biased because it is extrapolated from smali subsets of
C2H2-ZNF genes.
In this report, we reconstructed a global picture of the evolution of the C2H2-ZNF
gene repertoires during mammalian speciation, based on a comprehensive catalogue of ail
human C2H2-ZNf genes and their syntenic counterparts present in clusters in other
mammals. Our study clearly dernonstrates that this gene farnily expanded and contracted
flot only in hurnan but across mammals and in a lineage-specific fashion. In addition, we
discovered evolutionary change of individual C2H2-ZNF orthologs invoiving both
differential duplication of zinc finger motifs and loss of N-terminai effector dornains.
$peciation of mammals is characterized by divergent evolutionary trends at the level of
individual C2H2-ZNF genes as well as the entire farnily. This led us to propose a model for
the evolution of SCAN, SCAN-KRAB and KRAB subfamilies and points to the importance
of comparing complete repertoires rather than C2H2-ZNF genes from specific subfarnilies
for gaining insights into the possible orthologous relationships between genes from varions
genornes.
37
Page 56
METHODS
Collection of human C2H2 zinc finger genes
We conducted an extensive sirnilarity search to identify the compiete repertoire of
C2H2-ZNF genes in the hurnan genome (assembly NCBI 36). First, we identified ail the
genes annotated as C2H2 and/or Kmppei zinc finger genes by performing an initiai text
term search via Entrez (www.ncbi.nlm.nih.gov). Second, we used PROSITE
(http://www.expasy.com) to identify ah the proteins which had a zinc finger motif of the
C2H2 type as weil as the N-terminal effector domain, if present.
from these searches, the genomic coordinates, chromosome number, position on the
chromosome, number of fingers and identified domains were collected for each of the gene
and protein sequences (initial dataset). A TBLASN (e-vaiue ctttoff le-3) (Gertz, Yu et ai.
2006) search was done against the genome using each of the gene sequences from the
initial dataset as a query. The blast hits were used to generate the final dataset of ah the
identified C2H2-ZNF genes (Suppiementary Table Si).
Identification of C2H2-ZNF gene clusters in the human genome
We anaiyzed the relative positions of C2H2-ZNF genes in the human genorne in
order to identify the C2H2-ZNF clusters. A distribution of the distances between
neighboring C2H2-ZNF genes in the human genome is presented in Supplernentaiy Figure
Si. Two consecutive C2H2-ZNF genes are said to beiong to a ciuster if the distance
between them is 500 kb regardiess of the presence of other genes within the ciuster , a
38
Page 57
threshold classically used in gene farnily studies (Niimura and Nei 2003). Clusters were
determined for each hurnan chromosome.
Identification of mammalian C2H2-ZNF ciusters syntenically homologous to human
clusters
We searched for clusters hornologous to the human C2H2-ZNf clusters (i.e.
syntenically homologous clusters) in other mammals for which complete genorne
sequences are available. The assemblies used for Pan troglodytes), Mus muscutus, Rattus
norvegicus and Canis famitiaris were chimpanzee Pan Tro- 2.1, mouse NCBI m36, rat
RGSC 3.4 and dog Can fam 2.0. We used the linkage maps of Ensembi
(http://www.ensembl.org); assignment of syntenic clusters is based on the genes flanking
each human cluster and which were mapped in ail the species. Four flanking genes at each
extremity were mapped in most instances. Then, we conducted TBLASTN analysis of the
syntenic regions comprised between the flanking genes, using as queries the amino acid
sequence of the zinc finger region from ail the known human C2H2-ZNF genes from the
corresponding region. A hit with e-value le-4 confirrned the respective hornologous
clusters in the five mammalian genornes. A comprehensive catalogue of the hurnan C2H2-
ZNf clusters and their syntenic counterparts in other mammals is cornpiled in
Supplementary Table S4.
39
Page 58
Phylogenetic analysis
Phylogenetic analysis was conducted using the amino acid sequences of the zinc
finger region (identified using PROSITE) of C2H2-ZNF genes from selected human
clusters and their syntenically homologous clusters in chimpanzee, mouse, rat and dog. A
multiple sequence alignment of the zinc finger regions of the C2H2-ZNF genes was
generated using the program MUSCLE (Edgar 2004). The alignments were edited to
remove gaps using the program GBLOCKS (Castresana 2000). Maximum Likelihood (ML)
and Bayesian Inference (BI) methods were used to infer the phylogenetic trees and estimate
the clade support. for ML analysis, the program RAxML (RAxML-VI-HPC Version 2.2.1)
(Stamatakis, Ludwig et al. 2005) employing the WAG model of amino acid substitution
was used to reconstruct the best tree. Bootstrapping of 100 datasets was irnplemented. The
posterior probabilities were deterrnined by a Bayesian MCMC method implemented in the
program Mr.Bayes v.3. 1 (Huelsenbeck and Ronquist 2001) to test the robustness of the
topology of the tree infened through ML. One million generations were rcin and the trees
were sampled after every 10 generations.
To determine appropriate outgroups for our analysis, we searched the nr database to look
for close homologs in non-mammals using TBLASTN (e-value eut off le-4). In addition to
the Xfin sequence from Xen opus Ïaevis, we obtained a set of zinc finger genes from
Chicken (Gallus gaïlus, Assembly WASHUC2) specifically selected for each human
C2H2-ZNF cluster based on an extensive similarity search. To select the chicken outgroup,
a TBLASTN (e-value cutoff le-4) search was done against the chicken genome using each
ofthe human C2H2-ZNF sequences derived from the selected cluster of interest as a query.
40
Page 59
The top 10 hits for each query sequence were ail analysed using a CD-HIT anaiysis
(Identity threshold = 100%, 95% and 90%) (Li, Jaroszewski et aI. 2001) to produce a final
set of non-redundant representative chicken sequences, ail used as a part ofthe outgroups.
Sequence analysis to confirm the Ioss of domains
In the case where loss of a dornain was suspected, we conducted an extensive
sequence analysis to mie out the possibility that these domains would have been rnissed
either due to a frame-shift or inadequate exon-intron spiicing of the gene and thus
inappropriate amino acid translation, preventing recognition by PROSITE
(http://www.expasy.com). Firstiy, for each particular C2H2-ZNF genes where loss of an
N-tennina1 dornain was suggested, we systematically collected the nucleotide sequence of
the region ranging from the stop of translation of the previous gene to the start of
translation of the next gene. We conducted a TBLASTN search of this region using die
amino acid sequence of the dornain of interest (present in the colTesponding orthoiogs and
the consensus of the domains selected from randomly seiected sequences) as a query to
confirm the absence of the domain in the C2H2-ZNF gene of interest. Secondly, we
obtained the exon-intron structure of these genes using the Ensernbl Genorne Browser
(http://www.ensembi.org). In order to search for exonic 01. intronic sequences which may
exhibit significant identity with the nucleotide sequence of the domain of interest. For this
purpose we conducted a BLAST anaiysis of the individual exon and intron sequences with
the nucleotide sequence of the various domains that are present in the coiresponding
orthoiogs.
41
Page 60
Flowchart of the study
figure 1 summarizes the flowchart of our analysis procedure of C2H2-ZNF genes and
clusters in mammals.
42
Page 61
RESULTS
Compilation of a comprehensive catalogue of human C2H2-ZNF genes
Previous studies reported the existence of at least 564 C2H2-ZNF genes in the
human genome and suggested that this family may include approximately 700-800 genes
(Bellefroid, Lecocq et al. 1989; Bellefroid, Poncelet et al. 1991). As a first step to study the
evolution of C2H2-ZNF genes, we established a comprehensive catalogue of the C2H2-
ZNf genes in the hurnan genome. By conducting an extensive simiiarity search (see
Methods), we identified 71$ C2H2- ZNF genes (compiied in Supplementary Table Si). 0f
the 718 genes, 66 are annotated as pseudogenes in Genflank. For ail genes, we determined
their exact position on the chromosomes, their orientation, the number of finger motifs and
the effector domains.
These genes are distributed across ail chromosomes of the human genorne
(Supplementaiy Table $2). As reported earlier, chromosome 19 has the highest number
(Venter, Adams et al. 2001) and density ofC2H2-ZNF genes, inciuding 40% (289) ofthe
71$ human C2H2-ZNF genes, whereas this chromosome corresponds to only 2.1 % of the
human genome. More than haÏf (58%) of the C2H2-ZNf genes encode conserved N
tenuinal domains, the KRAB, SCAN and BTB dornains (figure 2A). typically involved in
transcriptional regulation (Kim, Chen et al. 1996; Collins, Stone et al. 2001) and form
different C2H2-ZNF subfamilies. Further, we discovered two additïonal dornains typical of
transcription regulators, the SET and HOMEO domains that are also encoded by C2H2-
ZNf genes. While the KRAB subfarnily represents almost haif of the C2H2-ZNF genes
(45%), SET and HOMEO C2H2-ZNF genes together with members of ail the other
43
Page 62
subfamilies account for oniy a small percentage (—12%) of the C2H2-ZNF genes (figure
2A).
Clustered organization of human C2112-ZNf genes
It was reported earlier that chromosome 19 is particularly rich in tandemly
duplicated C2H2-ZNf genes and that KRAB C2H2-ZNF genes. are clustered on several
other chromosomes (Dehal, Predki et al. 2001; Rousseau-Merck, Koczan et al. 2002). In
order to trace the duplication history of the entire C2H2-ZNF repertoire, we studied the
distribution of these genes across the whole human genome. Two consecutive C2H2-ZNf
genes were considered to belong to a cluster if the distance between them is 500 Kb,
regardless of the presence of other genes or pseudogenes within the cluster (see Methods).
Using this definition, we identified 81 human C2H2-ZNF clusters accounting for 72 % of
the total number ofC2H2-ZNF genes (518 of the 718) (Supplementary Tables S2 and S3).
The rernaining genes are dispersed as singletons. Among these clusters, 3 1 ¾ include
exclusively tandemly organized C2H2-ZNF genes with no other intervening genes (figure
2B, Supplementary Table S3). The number of C2H2-ZNF genes per cluster ranges from 2
to 76 with an average of 6. As illustrated in the Figure 2B, about 75 % ofthe total number
of C2H2-ZNf clusters has between two to six genes. Consistent with previous reports,
chromosome 19 flot only has the Iargest number of C2H2-ZNF clusters (Supplernentaiy
Table S2) but also hosts the largest clusters (>12 genes) (see figure 2B and
Supplementary Table S3).
44
Page 63
We find that the large majority of KRAB (89 %) and SCAN (90 %) types of C2H2-ZNF
genes are arranged in clusters (Figttre 2A and Supplementary Table S2). This contrasts with
the BTB subfarnily of C2H2-ZNF genes or those lacking regulatory domains which occur
more offen as singletons in the genome. An analysis of the composition of individual
clusters revealed that two-third of the clusters contains a mixture of various C2H2-ZNF
subfamilies (‘mixed clusters’, Supplementaiy Table S3). The few ciusters made up of a
single C2H2-ZNF gene subfamily (‘pure ciusters’) are ofsmall size (<4 genes).
Identification and comparison of syntenic C2H2-ZNF clusters across mammals
With the ultimate goal to study the evolution of zinc finger genes, we identified and
compiied clusters in completely sequenced mammalian genomes (i.e. chimpanzee, mouse,
rat and dog) that are syntenically homologous to those of hurnan. SyntenicaÏly homologous
clusters were identified by the genes flanking each ciuster. Then, ail the C2H2-ZNF genes
found within the delimited syntenic regions were identified using a TBLASTN search (sec
Methods). The $1 human C2H2-ZNF clusters and their syntenic counterparts in other
mammais are listed in Supplementaiy Table 54, which also inciudes information on the
orientation of the genes in the clusters, their associated domains, the number of zinc finger
motifs and the flanking genes.
Primates (Homo sapiens and Pan troglodytes) stood out for their large number of
both C2H2-ZNF clusters and genes within them, as compared to rodents (Mts musculus
45
Page 64
and Rattus norvegicus) and Canis fiunitiaris (Figure 3A). The most parsimonious
explanation is that a large expansion of C2H2-ZNF genes occurred in primates, and more
particularly in human (51$ genes in human versus 397 in chirnpanzee) after divergence
from rodents and canines. Rat lias siightly less C2H2-ZNF genes than dog (7%), but 25%
less than mouse. Considering the evolutionary relationship of the species (Figure 3A), these
data suggest that flot oniy species-specific duplication events, as reported earlier (Dehal,
Predki et al. 2001; Hamilton, Huntiey et al. 2003; Shannon, Hamilton et al. 2003), but aiso
loss of family members (suggested here in rodents) may have occurred during the evolution
of mammals. Differential species-specific expansion was reported previously for a subset of
genes from the human ZNF45 subfamily on chromosome 19 compared with its mouse
counterpart (Shannon, Harnilton et al. 2003). Furthermore, expansion of the human KRAB
C2H2-ZNF subfamily was also shown earlier based on draft versions of the genornes of
chimpanzee, mouse and dog (Huntley, Baggott et ai. 2006). However, evidence of C2H2-
ZNF gene or ciuster ioss couid not be definitively obtained in these studies as it required
detaiied analysis of more than two compieteiy sequenced genomes.
Comparing individual syntenic clusters in the mammalian genomes
To distinguish whether differences in the number of C2H2-ZNF clusters are due to
species-specific gene gain or ioss, we systematicaily compared individual syntenic clusters
in the five mammalian genomes studied. The resuits of this analysis point to a differential
evolutionaiy history in mammals. About 60 ¾ ofthe human clusters (49) have syntenicaliy
homologous counterparts in ail the species studied indicating that these C2H2-ZNf clusters
46
Page 65
predate the divergence of dog, rodents and primates (Supplementary Table S4 and
$upplementary Figure S2). In addition, we found (j) primate specific clusters (14 including
2 human specific clusters), (ii) clusters, present in primates and dog, that were lost in
rodents (8 ciusters including 3 present in mouse but absent in rat) and (iii) clusters present
in primates and rodents but absent in dog (10 clusters) (examples in Figure 3B). Essentially
ail the primate clusters have iarger number of genes than rodent or dog clusters which
reflects a global primate-specific expansion of C2H2-ZNF (Supplementary Figure S2).
Further, in 40% of ail primate clusters, those from human contain more C2H2-ZNF genes
than those from chimpanzee. This indicates that most of the evolutionaiy changes
(duplication and/or loss) occurred late in the primate branch. A similar patteni was seen in
rodents, where almost ail mouse C2H2-ZNF clusters exhibit more genes than their syntenic
rat clusters. While these resuits illustrate that the C2H2-ZNF gene family is rapidly and
independentïy evolving within different Ïineages, insights into the role of gene duplication
and loss in the histoiy of this gene family required rigorous phylogenetic analysis.
Phylogeny of C2H2-ZNF clusters in mammalian genomes
For addressing the relative contribution of gene duplication and loss in the evolution
of C2H2-ZNF genes in mammals, we focused our smdy on selected large human C2H2-
ZNf clusters and their syntenic counterparts in four other mammals. We expected that
larger ciusters would be more informative and possibly more representative of the whoie
genomes. Because of the clarity of evolutionary scenarios observed in the tree, we present
here a detailed phylogenetic analysis of the second largest hurnan C2H2-ZNF cluster (43
47
Page 66
genes) located on chromosome 19q13.4, that we narned cluster 19.12, and of its syntenic
clusters (Supplementary Table $3 and S4) in other species. For phylogenetic analysis, we
used the predicted amino acid sequences of the zinc finger regions. Genes annotated as
pseudogenes in Genbank or genes containing less than three zinc finger motifs were not
considered in the phylogenetic analysis (noise is expected to be too high if sequences of
only 56 amino acids corresponding to 2 fingers motifs or less were included). Our total
data set of C2H2-ZNF sequences from the hurnan cluster 19.12 and their syntenic
homologs in chimpanzee, mouse, rat and dog consists of 101 protein sequences, including
the outgroup sequences from Xen opus and Chicken. We constructed a phylogenetic tree
using Maximum Likelihood and Rayesian methods. We subdivided the tree into three
groups (figure 5) based on the kind of evolutionaiy scenarios observed i.e. one-to-one and
one-to-many orthologous relationships between genes as weIl as gene loss as defined in
f igure 4. The number of C2H2-ZNF sequences from each species is highlïghted for each
group. Two of these groups are monophyletic with significant (95%) support in both the
Maximum Likelihood and Bayesian analysis (Group I and III).
A detailed analysis of the tree revealed four clades that underwent species-specific
expansion, and two clades, with gene loss in some species. For example, a dog-specific
expansion is seen in the monophyletic Group I, which includes three clustering genes from
human (hZNF331), chimpanzee (pZNF331) and dog (cZNF33Y) which in tum grouped
within a larger clade containing fine additional C2H2-ZNF genes from dog. In addition,
this clade indicates a loss in rodents, due to the absence of mouse or rat genes. Group I
alone illustrates how both species-specific dtiplication in dog and loss in rodents can
48
Page 67
account for the higher number of genes seen in dog C2H2-ZNf clusters as cornpared to
rodents.
Group II shows more pronounced expansion in hurnan as seen in several clusters.
for example, one of the primate-specific clades includes 17 human genes and 7 chimpanzee
C2H2-ZNF genes (f igtire 5). 0f the 17 hurnan genes present in the clade, only 6 genes
show a one-to-one orthologous pairing with chimpanzee genes. Another well supported
clade includes a single human gene (hZNf677) clustered with two dog genes (L0C48433 1
AND L0C476394). In this clade, the absence of a chimpanzee or rodent counterparts to
these three genes suggests a loss in these species. for chimpanzee, however, loss by
pseudogenization is possibly involved (see physical maps described below); note that the
percentage of C2H2-ZNf genes annotated as pseudogenes was higher in chimpanzee that in
human C2H2-ZNf clusters (62, Supplernentary Table S4).
In group III, the relationship cf the four rodent genes with the dog and primate genes
could flot be resolved (bootstrap values < 95 ¾). However, a rodent-specific clade revealed
a mouse-specific duplication exhibiting a higher number cf C2H2-ZNF genes in mouse
than rat, as seen in several other cases in our study.
Superimposition of the phylogenetic trees with the physical maps of clusters
Comparison of gene trees, species tree and physical rnap infomiation cf cluster
19.12 genes and its syntenic homclogs provide better insights into the processes underlying
the evolution cf the C2H2-ZNF clusters. The phylogenetic tree obtained for cluster 19.12
(Figure 5) suggests a simultaneous differential expansion and loss of C2H2-ZNF genes
49
Page 68
throughout evolution. In perfect agreement with the phylogenetic tree, genes of the
monophyletic groups I and III were found to be physically clustered together on the
chromosomes across mammals (Figure 6). Evidence for a tandem duplication event is
provided by the comparison of the relationship within C2H2-ZNF genes of Group I on the
tree with their spatial relationships in the physical maps that showed that the sequences of
the dog clade form a tandem array on the chromosome (Figure 5). In addition to tandem
duplication of individual genes within this group, e.g. cLOC484324 and cLOC484323
(figure 5) which are next to each other on the chromosome and exhibit the sarne orientation
(figure 6), we also discovered tandem duplication of multiple genes. for instance, three
genes (L0C482273, LOC6 11599, L0C480782/ orientation -, +, +) appear as a tandem
repeat ofthree other genes (LOC61 1583, L0C484328, L0C484326/ orientation -, +, +) in
this group (figures 5 and 6).
The group II mainly contains primate-specific C2H2-ZNF genes that cluster on the
phylogenetic tree in two well supported clades ( 97 % bootstrap) and a sub-group of
weaker support (93% bootstrap). Aimost ail these genes also cluster physicaily together on
the chromosome. Human orthoÏogy assignrnents for ten of the twelve chimpanzee genes
from group II (underlined in Figure 6) were corroborated by two lines of evidence i.e. from
the phylogeny, which was supported by the topology on the chromosome. furthermore,
genes from 7 out of 10 of the C2H2-ZNF ortholog pairs from this primate-specific cluster
exhibit the same number of zinc finger motifs and the same type of N-terminal motif
50
Page 69
Species-specific variation in the number of finger motifs and the presence of N-
terminal conserved domains
When analysing the C2H2-ZNF genes from the 81 human clusters and their
syntenic homologs in mammals, we noticed that the average number of zinc finger motifs
varied depending on the C2H2-ZNF gene subfamiiies. Noticeably in ail the marnmaiian
species studied, genes with KRÀB and SCAN-KRAB motifs have a higher number of zinc
finger motifs tlian those from the other subfamiiies (Figure 7A). for exampÏe, member of
the KRAB subfamily have an average of 10 to 17 zinc finger motifs, whule members ofthe
BTB subfamily have oniy 2 to 3 (figure 7A). We also noted species-specific variation in
the number of zinc finger motifs within mammalian C2H2-ZNF genes. In particuiar, dog
tends to have a much higher number of zinc finger motifs in rnost C2H2-ZNF gene sub
families (Figure 7). Strikingly, L0C484264, a dog KRAB C2H2-ZNF gene exhibits 70 zinc
finger motifs which is to our knowledge the highest number of zinc finger motifs to be
reported for a zinc finger gene. Study of cluster 19.12 (Figure 5) iliustrates more
specificaliy the trend ofdog genes to exhibit more zinc finger motifs; the dog L0C484338
gene (group III), for example, lias six times more zinc finger motifs than its human
ortholog. Furtliermore, the dog gene L0C424326 lias neariy twice as many motifs as its
closest paralog L0C480782 (group I) (Figure 5). This indicates a quite recent and drastic
expansion of zinc finger motifs within dog C2H2-ZNF genes, after tlie separation of dog
from rodents and primates. In several cases, the C2H2-ZNf mammalian orthologs revealed
differences in tlieir numbers of finger motifs even within primate or within rodent lineages
(Figure 7B and Supplernentary Table 4).
51
Page 70
In addition to the difference in the number of finger motifs in C2H2-ZNF
orthologs and paralogs, we also found a variation in the presence of the N-terminal effector
domains. As an example, orthologs ofthe C2H2-ZNF genes in the human cluster 6.2 show
a variation in the presence of the KRAB or SCAN domains (Figure 73), suggesting
frequent and multiple losses and/or gains of KRAB and SCAN domains during evolution.
To reconstntct these events, we analyzed in detail the exon-intron structure and sequences
ofthese genes (See Methods). Serendipitously, this analysis led us to the observation that a
large majority of the C2H2-ZNF containing a SCAN-KRAB or SCAN domain had each a
typical exon-intron organization (figure 7C). For example, both human genes, ZNF[92
(SCAN-KRAB) and ZNF187 (SCAN) and their respective orthologs in other species
(Figure 63) share the predominant exon-intron organization most typical of SCAN-KRA3
and SCAN C2H2-ZNF, respectively. Whule the dog LOC4$83 1$ bas only a SCAN dornain.
its coriesponding orthologs in human, mouse and rat have a SCAN-KRAB. When the
nucleotide sequence of the exon which would have been predicted to encode a KRAB
domain in dog (third exon after the SCAN) was compared with those ofhuman, mouse and
rat, the dog sequence exhibits a high conservation at the nucleotide level (>82 %) but no
significant similarity at the arnino acid ÏeveÏ. This indicates that the loss of the KRAB
domain in dog was due to sequence degeneration. Similarly, while the chimpanzee
ZNFÏ87 and its rat orthoÏog encode a SCAN dornain, a degenerate SCAN domain was
identified in the corresponding exon of their hurnan and mouse orthologs. For the human
SCAN-KRAB ZNF3O7 gene, we noticed that it exhibits an exon-intron organization typical
of SCAN-KRAB C2H2-ZNF (Figure 7C) whereas its orthologs in the chimpanzee, mouse
52
Page 71
and rat encode solely a SCAN domain and present an exon-intron structure more typical
of SCAN C2H2-ZNf. However, it was found that, in chimpanzee, a sequence similar to the
KRAB sequence (99% at the nucleotide level) was embedded in the intron preceding the
exon encoding the zinc finger domain. No KRAB related sequence could be detected in the
rodent orthologs even with a detailed analysis of their sequences. Thus, either the KRAB
sequence was gain in the primate lineage or lost in the rodent lineage. For reasons
explained in the discussion, we believe that loss, rather than gain, is a more likely
hypothesis.
53
Page 72
DISCUSSION
Comparative studies in genome research focused on the extensive similarities
existing between the human genome and the genomes from various other model organisms
which provide valuable insights into biological function and aetiology of human diseases.
However, differences existing among genomes have received less attention inspite of the
importance they may have in the physiological, morphological and behavioural distinctive
traits observed among species. A few studies on various gene families, such as the odorant
receptor family, pointed out to some differences existing between genes of closely rehited
species (Sitnikova and Su 1998; Lapidot, Pupe! et al. 2001; Niimura and Nei 2003; Gilad,
Man et al. 2005; Niirnura and Nei 2005). Our study ofthe C2H2-ZNf gene family reveals
that there is an extensive variation of the C2H2-ZNF gene content and organization in the
genornes from various mammals as well as in the domain composition of orthologous genes
arnong species. It also provides the first clear demonstration of the contribution of gene
loss in the C2H2-ZNF family during evolution which occurs at ifie level of clusters, genes
and their ftinctional dornains. We provide the first genome scale confirmation of the rapid
evolution of C2H2-ZNF gene clusters that occurs independently within related species
which also supports conclusions drawn from smaller-scale studies on individual genes,
clusters and C2H2-ZNF subfamilies (Dehal, Predki et al. 2001; Shannon, Harnilton et al.
2003; Harnilton, Huntley et al. 2006; Huntley, Baggott et al. 2006).
54
Page 73
Substantial variation in the C2112-ZNF gene family size and clustering across
mammals
We report here the first complete catalogue of ail human C2H2-ZNF gene clusters and their
syntenic homologs in chimpanzee, mouse rat and dog. This catalogue reveals that in
hurnan, a large proportion of the genes from the C2H2-ZNF family (>70%) are organized
in clusters. Comparative studies of the five mammallan genomes indicated that the total
number of genes found in clusters varied considerably from 172 in rat to 518 in human
(number of genes found in clusters in human > chimpanzee > mouse > dog > rat).
Significantly, human and mouse have a larger number of clustered C2H2-ZNF genes
(>30%) as compared to chimpanzee and rat, respectively, indicating that independent
evolutionary events occurred after the divergence of the two primates (within the last 6-
10 million years) and two rodents (within 30-46 million years). We distinguish two kinds
of events: first, a variation in the niimber of C2H2-ZNF genes in syntenically hornologous
clusters and second, the existence of lineage- and species-specific clusters in primates,
rodents and canines. This can be accounted for by independent evolution of C2H2-ZNF
genes in these closely related species. Previous studies focusing on KRAB C2H2-ZNF
from chromosome 19 had identified and analyzed a primate-specific citister (Mohrenweiser
1998) including members of the primate-specific ZNF9I subfamily of C2H2-ZNf
(HamiÏton, Huntley et aÏ. 2006). Other studies on the KRAB C2H2-ZNf subfarnily
identified a differential expansion between a human KRAB C2H2-ZNF cluster and its
syntenic counterpart in mouse (Shannon, Hamilton et al. 2003) and more recently other
species-specific expansions based on draft of various mammalian genornes (Huntley,
55
Page 74
Baggott et al. 2006). We illustrate and confïrrn at a larger scale the existence of an on
going process of genome dynamics with several lineage- and species-specific
rearrangements and continuous repertoire expansion taking place independently in ail
evolutionaiy branches, particularly in primates. This finding was only possible through the
analysis of a complete catalogue of ail the subfarnilies of C2H2-ZNf clusters and their
syntenic counterparts in mammals.
Gene duplication and loss: Two counteracting forces in the evolution of C2H2-ZNF
genes
An overview of $1 human C2H2-ZNF ciusters identified here revealed that a third
ofthern are pure clusters (with 2 to 24 C2H2-ZNF genes), i.e. they are flot interspersed with
other genes. Earlier observations of pure C2H2-ZNF gene ciusters have Ied to the
hypothesis that C2H2-ZNF genes in primates have expanded massively by tandem
duplication (Belleftoid, Marine et al. 1993; Eichler, Hoffrnan et al. 1998; Elemento and
Gascuel 2002; Schmidt and Durrett 2004; Bertrand and Gascuel 2005; Huntley, Baggott et
al. 2006). We revisited this question based on our catalogue of human C2H2-ZNF clusters
and their syntenic counterparts in chimpanzee, mouse, rat and dog. Here, we conflrmed
gene duplication and loss based on a reconciliation of both physicaÏ maps and the
surperimposition of gene trees onto the known species tree (Page and Charleston 1 997).
Our resuits clearly show that both gene gain and gene loss events have occurred multiple
times and independently in ail the mammals studied. Combined with physical map data, our
phylogenetic studies indicate that the expansion of C2H2-ZNF genes evidenced during the
56
Page 75
evolution of the five species studied results from the combined action of single-gene
duplication and multiple gene duplication (for instances, duplication of ah or part of the
genes within a cluster). These duplication events were however counteracted by the loss of
individual genes or clusters as exemplified in several cases where related genes or clusters
found in primates and canine were absent in both or in any of the two rodents studied. This
study represents the first clear demonstration of the involvement of gene and cluster loss in
the evolution of C2H2-ZNF genes and suggests that during mammalian evolution the
duplication events outnumbered the loss events. Our resuits provide convincing support
that the C2H2-ZNF gene farnily evolved according to the “Birth and DeathT’ model as
proposed by Nei and colleagues (Nei, Gu et al. 1997; Nei 2000). According to this model,
new genes are created by duplication incÏuding tandem duplication and block gene
duplication (birth). While certain copies remain relatively unchanged in the genome for a
long tirne, others diverge functionahly by acquiring a new function. Sorne get deleted from
the genome or becorne pseudogenes following deleterious mutations (death through
ehimination or inactivation). In the case of C2H2-ZNF genes pseudogenization seems to be
limited, as suggested by expression studies and statistical analysis showing positive
seÏection based on the analysis of specific clusters (Schmidt and DmTett 2004; Huntley,
Baggott et al. 2006). This makes the C2H2-ZNf famlly different from the other gene
farnilies such as the olfactory receptor gene famiÏy (Glusman, Yanai et aÏ. 2001; Niimura
and Nei 2003) . Noticeably, gene loss by pseudogenization was prominent for the
olfactory receptors with humans accumulating a higher number of olfactoiy receptor
pseudogenes as compared to other primates and mouse (Sitnikova and Su 199$; Lapidot,
57
Page 76
Pilpel et al. 2001; Niimura and Nei 2005). These variations in the numbers ofpseudogenes
and fiinctional genes have been associated with the differential chemosensoiy dependence
in these species (Sharon, Glusman et al. 1999; Quignon, Kirkness et al. 2003). In
comparison, besides the fact that they are known to fiinction as regulator of transcription,
the functions of only a few C2H2-ZNF proteins are known (Krebs, Larkins et al. 2003).
Further studies of C2H2-ZNf genes in mammals could shed light on the functional
consequences of different repertoires of these genes in different species. Until now, the
clustered organization of these genes lias made knock-out studies in animal modeis
inefficient, possibly due to redundancy. However, based on a better knowledge of the
organizationlcontent of C2H2-ZNF genes in the various genornes, large chromosornal
deletions of pure C2H2-ZNf clusters or other types of gene disruption or targeting
approaches could provide insights into the functions of these genes in different animal
models.
Evolution of C2H2-ZNF genes through gain and loss of finger motifs and N-terminal
effector domains
Evidence ofthe variation in the numbers of zinc finger motifs among orthologs was
previously reported for a subset of hurnan chromosome 19 C2H2-ZNf genes and their
mouse homologs (Looman, Abrink et al. 2002). It was shown that this variation is due to
both differential duplication of finger motifs and loss due to degeneration. In our study,
such variation in the number of zinc finger motifs arnong orthologs was observed
recurrently among ail mammals. Since the zinc finger motifs appears as a flexible motif
58
Page 77
with the ability to bind DNA, RNA and/or proteins, changes in the zinc finger motif
sequences and number within C2H2-ZNF genes could differentially alter binding
specificities and thus protein function. Both changes in the number of C2H2-ZNF genes
and in the number of finger motifs encoded by orthologous genes may be determinant in
species-specific related ftmction.
The rapid evolution of the C2H2-ZNF genes observed in the mammalian lineage
was flot limited to the variation in the number of genes and zinc finger motifs. Variation in
the presence of N-terminal effector domains, such as SCAN or KRAB, was observed in
orthologs and could be accounted for by either gain or loss of these motifs in the various
species. Loss by sequence degeneration of both the SCAN and the KRAB sequences vas
confirmed in several cases in our study. In some cases, neither loss nor gain could be
resolved. A puzzling question remains however if one assumes that gain of KRAB and
SCAN sequences can occur recurrently within C2H2-ZNF genes. It is indeed diffictiit to
explain that these effector domains are always found in association with and N-terminal to
the zinc finger motifs of C2H2-ZNF proteins and that the SCAN dornain is aiways
positioned N-terminal to the KRAB domain of SCAN-KRAB C2H2-ZNF proteins.
Interestingly, by analyzing the exon-intron structure of C2H2-ZNF genes from the clusters,
we found that most SCAN C2H2-ZNf and SCAN-KRAB C2H2-ZNF genes have each a
typical exon-intron structure (Figure 7C and Figure 8A). This suggests that the acquisition
of a SCAN and KRAB sequences by C2H2-ZNF genes corresponds most likely to singular
events. This led us to propose the moUd described in Figure 8. Considering that the SCAN
domain is found in ah vertebrates and is more ancient than the KRAB dornain only found
59
Page 78
in tetrapods, we suggest that first a SCAN-C2H2 ZNF gene was formed in an ancestor of
vertebrates through the gain of SCAN sequence and that later, after the emergence of the
tetrapods, the gain of a KRAB sequence by a SCAN C2H2-ZNF gene gave rise to a SCAN
KRAB C2H2-ZNF gene (Figure 83). These two gain events possibly occurred through an
exon-shuffling mechanism. Diversification of the C2H2-ZNF repertoires from each
subfamily then occuned dynamicaliy through on-going gene duplications and loss by
deletion or degeneration of the SCAN and!or KRAB sequences. As irnplied by this model,
the birth of the SCAN-KRAB C2H2-ZNF subfamily occurred earlier than that of the
KRAB C2H2-ZNF subfamiiy. This was consistent with previous data showing that SCAN
KRAB-ZNf genes do flot group together on one evolutionary ciade but intermix with
KRAB-ZNF genes in phylogenetic trees of the KRAB sequence (Looman, Abrink et al.
2002; Huntley, Baggott et al. 2006) (Huntiey, Baggott et al. 2006). On the whole, our
model is in agreement with the fact that C2H2-ZNF orthologs often belong to different
C2H2-ZNF subfamilies and that we observed intermingling of C2H2-ZNf genes from the
SCAN, KRAB and SCAN-KRAB subfamiiies in many C2H2-ZNF clusters. Our study
clearly indicated that the evolution of C2H2-ZNf subfamilies is tightly iinked and stresses
that the assignrnent of proper orthology requires comprehensive analysis of ail C2H2-ZNF
genes rather than the individual analysis of specific C2H2-ZNF subfamilies. It also points
to the importance of ioss/contraction and secondary simplification whose role in the
dynamics of evolution is ofien underestirnated. The underiying rnechanisms in the
expansion of C2H2-ZNF genes and the flinctional consequences of the important changes
(gain and loss) occurring in their repertoires of various mammals are unclear. These
60
Page 79
variations, for example, may be at the advantage of complex organisrns by providing more
subtie and species-specific control in gene expression for morphogenesis or cognitive
functions.
ACKNOWLEDGEMENTS
The authors thank Franz B. Lang for suggestions and help in phylogenetic analysis and
Nicolas Lartillot (Universite de Montreal) for constructive comments on tree building and
datasets. We also acknowledge Herve Philippe, Henner Brinkmann and Amy Hauth for
critical discussions and Allan $un for assistance in hardware and software installations.
This work was supported by a grant from Natural Sciences and Engineering Research
Council of Canada (to M.A). M.A is a Chercheur National from the Fonds de la recherché
Sante du Quebec (FRSQ) and G.B is an associate from CIFAR (Canadian Institute for
Advanced Research).
61
Page 80
REFERENCES
Bellefroid, E. J., P. J. Lecocq, et al. (1989). “The hurnan genome contains hundreds of
genes coding for finger proteins ofthe Kruppel type.” DNA 8(6): 377-87.
Bellefroid, E. J., J. C. Marine, et al. (1993). “Clustered organization of homologous KRÀB
zinc-finger genes with enhanced expression in hurnan T lymphoid ceils.” Embo J
12(4): 1363-74.
Bellefroid, E. J., D. A. Poncelet, et al. (1991). “The evolutionarily conserved Kruppel
associated box domain defines a subfarnily ofeukaiyotic rnultifingered proteins.”
Proc Nati Acad Sci U S A 88(9): 3608-12.
Bertrand, D. and O. Gascuel (2005). “Topological reanangements and local searci rnethod
for tandem duplication trees.” IEEE/ACM Trans Comput Biol Bioinform 2(1): 15-
28.
Castresana, J. (2000). “Selection of conserved blocks from multiple alignments for their use
in phylogenetic analysis.” Mol Biol Evol 17(4): 540-52.
Coilins, T., J. R. Stone, et ai. (2001). “Ail in the family: the BTB/POZ, KRAB, and SCAN
domains.” Mol Ceil Biol 21(11): 3609-15.
Dehal, P., P. Predki, et al. (2001). “Human chromosome 19 and related regions in mouse:
conservative and lineage-specific evolution.” Science 293(5527): 104-11.
Edgar, R. C. (2004). “MUSCLE: multiple sequence alignment with high accuracy and high
throughput.” Nucleic Acids Res 32(5): 1792-7.
Eichler, E. E., S. M. Hoffman, et al. (1998). “Complex beta-satellite repeat structures and
the expansion of the zinc linger gene cluster in l9pl2.” Genome Res 8(8): 791-808.
62
Page 81
Elemento, O. and O. Gascuel (2002). “An efficient and accurate distance based algorithm
to reconstruct tandem duplication trees.” Bioinformatics 1$ Suppl 2: $92-9.
Gertz, E. M., Y. K. Yu, et al. (2006). “Composition-based statistics and translated
nucleotide searches: improving the TBLASTN module ofBLAST.” BMC Biol 4:
41.
Gilad, Y., O. Man, et al. (2005). “A comparison ofthe human and chirnpanzee olfactory
receptor gene repertoires.” Genome Res 15(2): 224-30.
Glusman, G., I. Yanai, et al. (2001). “The complete human olfactory subgenorne.” Genome
Res 11(5): 685-702.
Grondin, B., M. Bazinet, et al. (1996). “The KRAB zinc finger gene ZNf74 encodes an
RNA-binding protein tightly associated with the nuclear matrix.” J Biol Chem
271(26): 15458-67.
Hamilton, A. T., S. Huntley, et al. (2003). “Lineage-specific expansion ofKRAB zinc
finger transcription factor genes: implications for the evolution of vertebrate
regulatory networks.” Cold Spring Harb Symp Quant Biol 68: 13 1-40.
Hamilton, A. T., S. Huntley, et al. (2006). “Evolutionaiy expansion and divergence in the
ZNf91 subfamily ofprimate-specific zinc finger genes.” Genome Res 16(5): 584-
94.
Huelsenbeck, J. P. and F. Ronquist (2001). “MRBAYES: Bayesian inference of
phylogenetic trees.” Bioinfomatics 17(8): 754-5.
63
Page 82
Huntley, S., D. M. Baggott, et al. (2006). “A comprehensive catalog ofhurnan KRAB
associated zinc finger genes: insights into the evolutionaiy history of a large farnily
of transcriptional repressors.” Genome Res 16(5): 669-77.
Kim, S. S., Y. M. Chen, et al. (1996). “A novel member ofthe RIISTG finger fami!y, KRTP
1, associates with the KRAB-A transcriptional repressor domain of zinc finger
proteins.” Proc Nati Acad Sci U S A 93(26): 15299-304.
Klug, A. and D. Rhodes (1987). “Zinc fingers: a novel protein fold for nucleic acid
recognition.” Cold Spring Harb Symp Quant Bio! 52: 473-$2.
Krebs, C. J., L. K. Larkins, et a!. (2003). “Regulator of sex-limitation (Rsl) encodes a pair
of KRAB zinc-finger genes that control sexually dimorphic liver gene expression.”
Genes Dcv 17(2 1): 2664-74.
Lander, E. S., L. M. Linton, et a!. (2001), “Initial sequencing and analysis ofthe hurnan
genorne.” Nature 409(6822): 860-921.
Lapidot, M., Y. Pupe!, et al. (2001). “Mouse-human orthoÏogy relationships in an olfactory
receptor gene cluster.” Genornics 71(3): 296-306.
Lee, M. S., G. P. Gippert, et al. (1989). “Three-dimensional solution stnicture ofa single
zinc linger DNA-binding domain.” Science 245(49 1$): 635-7.
Li, W., L. Jaroszewski, et al. (2001). “Clustering of highly hornologous sequences to reduce
the size of large protein databases.” Bioinforrnatics 17(3): 282-3.
Looman, C., M. Abrink, et al. (2002). “KRAB zinc linger proteins: an analysis ofthe
molecular mechanisms goveming their increase in numbers and cornplexity during
evolution.” Mol Biol Evol 19(12): 2118-30.
64
Page 83
Looman, C., L. Helirnan, et al. (2004). “A novel Kruppel-Associated Box identified in a
panel ofmamrnalian zinc fingerproteins.” Mamrn Genorne 15(1): 35-40.
Mark, C., M. Abrink, et al. (1999). “Comparative analysis ofKRAB zinc linger proteins in
rodents and man: evidence for several evolutionarily distinct subfamilies ofKRAB
zinc linger genes.” DNA Ceil Biol 18(5): 381-96.
Melnick, A., G. Carlile, et aI. (2002). “Criticat residues within the BT3 dornain ofPLZF
and Bel-6 modulate interaction with corepressors.” Mol Celi Biol 22(6): 1804-18.
Messina, D. N., J. Glasscock, et al. (2004). “An ORFeome-based analysis ofhuman
transcription factor genes and the construction of a microarray to interrogate their
expression.” Genome Res 14(lOB): 204 1-7.
Miller, J., A. D. McLachlan, et al. (1985). “Repetitive zinc-binding domains in the protein
transcription factor lITA from Xenopus oocytes.” Embo J 4(6): 1609-14.
Nei, M., and Kumar S. (2000). Molecular Evolution and Phylogenetics, New York: Oxford
University Press.
Nei, M., X. Gu, et al. (1997). “Evolution by the birth-and-death process in multigene
families ofthe vertebrate immune system.” Proc Natl Acad Sci U S A 94(15): 7799-
806.
Niimura, Y. and M. Nei (2003). “Evolution of olfactory receptor genes in the hurnan
genome.” Proc Nati Acad Sci U S A 100(21): 12235-40.
Niimura, Y. and M. Nei (2005). “Comparative evolutionary analysis of olfactory receptor
gene clusters between humans and mice.” Gene 346: 13-2 1.
Obta, T. (2000). “Evolution ofgene families.” Gene 259(1-2): 45-52.
65
Page 84
Page, R. D. and M. A. Charleston (1997). “From gene to organismal phyogeny:
reconciled trees and the gene tree/species tree problem.” Mol Phylogenet Evol 7(2):
231-40.
Quignon, P., E. Kirkness, et al. (2003). “Comparison ofthe canine and human olfactory
receptor gene repertoires.” Genome Biol 4(12): R$0.
Rhodes, D. and A. Klug (1993). “Zinc fingers.” Sci Am 268(2): 56-9, 62-5.
Rosati, M., M. Marino, et al. (1991). “Members of the zinc fingerprotein gene family
sharing a conserved N-terminal module.” Nucleic Acids Res 19(20): 566 1-7.
Rousseau-Merck, M. F., D. Koczan, et al. (2002). “The KOX zinc finger genes: genorne
wide mapping of 368 ZNF PAC clones with zinc finger gene clusters predominantly
in 23 chromosomal loci are confirrned by human sequences annotated in
EnsEMBL.” Cytogenet Genome Res 98(2-3): 147-53.
Sander, T. L., A. L. Haas, et al. (2000). “Identification ofa novel SCAN box-related protein
that interacts with MZF 1 B. The Ieucine-rich SCAN box mediates hetero- and
homoprotein associations.” J Biol Chem 275(17): 12857-67.
Schmidt, D. and R. Durrett (2004). “Adaptive evolution drives the diversification of zinc
finger binding domains.” Mol Biol Evol 21(12): 2326-39.
Schuh, R., W. Aicher, et al. (1986). “A conserved fami]y of nuclearproteins containing
structural elernents ofthe fingerprotein encoded by Kruppel, a Drosophula
segmentation gene.” Ceil 47(6): 1025-32.
Schumacher, C., H. Wang, et al. (2000). “The SCAN domain mediates selective
oligomerization.” J Biol Chem 275(22): 17 173-9.
66
Page 85
Shannon, M., A. T. Hamilton, et al. (2003). “Differential expansion ofzinc-finger
transcription factor loci in hornologous human and mouse gene clusters.” Genome
Res 13(6A): 1097-110.
Sharon, D., G. Glusman, et al. (1999). ‘Primate evolution of an olfactory receptor cluster:
diversification by gene conversion and recent emergence ofpseudogenes.”
Genomics 61(1): 24-36.
Sitnikova, T. and C. Su (199$). “Coevolution of immunoglobulin heavy- and Iight-chain
variable-region gene families.” Mol Biol Evol 15(6): 6 17-25.
Stamatakis, A., T. Ludwig, et al. (2005). “RAxML-III: a fast program for maximum
likelihood-based inference of large phylogenetic trees.” Bioinfomatics 21(4): 456-
63.
Stone, J. R., J. L. Maki, et al. (2002). “The SCAN domain ofZNFl74 is a dimer.” J Biol
Chem 277(7): 544$-52.
Theunissen, O., f. Rudt, et al. (1992). “RNA and DNA binding zinc fingers in Xenopus
TFIIIA.” CelI 71(4): 679-90.
Thornton, J. W. and R. DeSalle (2000). “Gene family evolution and hornology: genomics
meets phylogenetics.” Annu Rev Genomics Hum Genet 1: 41-73.
Urrutia, R. (2003). “KRAB-containing zinc-finger repressor proteins.” Genome Biol 4(10):
231.
Venter, J. C., M. D. Adams, et al. (2001). “The sequence ofthe human genome.” Science
291(5507): 1304-51.
67
Page 86
Witzgall, R., E. O’Leaiy, et aI. (1994). “The Kruppel-associated box-A (KRAB-A) domain
of zinc finger proteins mediates transcriptional repression.’ Proc Nati Acad Sci U S
A 91(10): 4514-8.
68
Page 87
Figure 1: Flowchart of the analysis procedure ofC2H2-ZNF genes and clusters
Compare with syntcnicaliv homologous
Identify C2H2-ZNF elusters elusters from complcted genomes
I IAna; in
ht:pe1ulus
tttisrvegicus inisiliiris
on the ‘cnonIc
t II Investigate the phylogenetic retationships hctw ccii
______________________
thc homologous C2H2-ZNF clusters
Extract compicte dataset of C2112-ZNF gcncs
IRcCOflcilllltiofl of phylogenetic rehitionships
+ spatial relationships + Species Trec
ipiens
_______________
69
Page 88
350
300
° 250u
200
150Q
100
g 50
o
ji ri t-i -
2 3 4 5 6 7 8 9101214242832404376
No. of C2H2-ZNF genes
Figure 2: Distribution of ail the singletons and clustered genes from the various
human C2112-ZNF sub-families ami gene composition of the C2112-ZNF clusters
A
45% 43 %O Singleton
• In clusters
5.7 %
, _ , s,Type 0f domains associated with C2H2-ZNF genes
B
45
40
35
20
15
10
5
o
C2H2-ZNF clusters with:
O Intervening non- C2H2-ZNF
• Solely C2H2-ZNF
70
Page 89
figure 2
Distribution of ail the singletons and clustered genes from the varlous human C2112-
ZNf sub-families and gene composition of the C2H2-ZNf clusters.
A) The number of genes belonging to the various C2H2-ZNF subfamilies are shown as
weÏÏ as the proportion of genes found as singletons or as part of clusters. C2H2-ZNF genes
associated with KRAB and SCAN domains are more often found to be ctustered. S-K=
C2H2-ZNf containing both a SCAN and a KRAB domain. NONE= C2H2-ZNf without
any conserved domain associated. The percentage distribution is mentioned on top of each
bar for each sub-family.
B) The number of C2H2-ZNf clusters is shown with respect to the number of genes present
in each cluster. The proportion of clusters composed of solely C2H2-ZNf without any
intervening gene or with intervening genes other than C2H2 ZNf (Non-C2H2-ZNF) is aÏso
represented. A star (*) identifies large clusters present on chromosome 19.
71
Page 90
Fig
ure
3:D
iffe
rent
ial
expa
nsio
nan
dlo
ssof
C2H
2-Z
NF
clus
ters
infi
vem
amm
alia
nge
nom
es
AB
50N
o.ol
No
ot(h
isft
rs(2
112-
ZN
F45
‘Hom
osa
pien
s81
518
‘Pan
trog
lody
tes
7939
735
‘.
3O
Mus
mus
culu
s62
232
ai25
..
Ra[
tis
norv
egle
us
5S17
220
z
‘(an
isth
niili
ai-is
’57
184
15
10
‘Gal
lus
gallu
s’05-
‘Xenopus
laev
is’
o
•Hum
anC
him
panzee
LiM
ouse
ER
atE
JDog
Iii
ail
sie
cle
sP
rim
ate
speclr
lcL
oss
inrodents
Absence
Indog
I
8.4
19.1
1
q -J[T
luit
rrIi
flI
19.1
27.4
10.1
19.7
12.1
19.8
X.1
19.6
Synte
nic
ally
hom
olo
gous
C2H
2-Z
NF
clust
ers
Page 91
Figure 3
Differential expansion and Ioss of C2H2-ZNF clusters in five mammalian genomes.
A) Evolution of the C2H2-ZNF repertoires in primates, rodents and dog. The number of
C2H2-ZNF clusters and the total number of C2H2-ZNF found in these clusters are
mentioned on the species tree. Since Xenopus taevis and Galtus gaÏÏus C2H2-ZNF are used
as an outgroup in phylogenetic studies, these species are also positioned on the tree.
The figure indicates the primate-specific increase in the number of C2H2-ZNf as cornpared
to rodents and dog.
B) A graphical representation of different scenarios seen in the evolution of human C2H2-
ZNF clusters and its syntenically hornologous C2H2-ZNf clusters in chirnpanzee, mouse,
rat and dog. The human clusters selected and narned on the graph as well as their syntenic
counterparts were 1) present in ail species, 2) primate-specific, 3) lost in rodents or 4)
absent in dog. For each hurnan C2H2-ZNF cluster named on the graph, the first number
indicates the chromosome number and the second is the number attributed to that cluster on
the chromosome. Supplementary Figure S2 provides a more comprehensive graphical
representation including the 40 human clusters that contain at least 3 C2H2-ZNF and their
syntenic counterparts in the four other mammals.
73
Page 92
Figure 4: Evolutionary scenarios in the phylogenetic tree
A C
Species Tree Cene Ioss in species
Species I SI gene
Species 2 S2 ene
Species3 •••• S3eiie
Species 3 54 gene
outgroup Outgroup
BCene gain in species
74
Page 93
figure 4
Evolutionary scenarios in the phylogenetic tree
The different kinds ofevolutionary scenarios seen in the phylogenetic tree are shown.
A) Species tree showing the evolutionary relationship between the species, 1, 2, 3 and 4.
B) A species-specific gain of genes appears as a clade including a single hornolog from
one species and multiple homologs from the other. Phylogeny between genes Si gene, S2
gene, 53 gene and S4 gene from species 1, 2, 3 and 4 respectively. Gene gain in species 4 is
observed. C) Species-specific gene loss appears as the absence of a corresponding ortholog
for one species on the tree and is deduced from the evolutionary relationships of the species
considered with the other species. Loss ofthe corresponding gene (S3 gene) in species 3.
75
Page 94
Figure 5: Phylogenetic analysis of C2H2-ZNF genes in cluster 19.12 of human and its
syntenic counterparts in other mammals
Ji!ii Z!4R LI K 12
_________________________
flZNF33I K 2CZNF33I K 2
o(1œ474328 K 3
H 4__LGH593Ki(
cLOC4O2273 k IiCC0Œ11583 613
1 I cLOc.154324KI7
I cC0C424323 KilCL0C611590K4
I ioo cL0C47733l K 17
50 6ZNF677 K III-- ccoc.174:31 k 16
5ZNF528 K 5
om rC LZ.7 K 17
M L i)ZNF534 K I’
t_1111 I1ZNF61OK9
i IIZNF4SO K 12
t [lF4L k 12oLOC134333 K 2
‘1 10i hZNFC3 k IN
6ZNFIC0 K 2)1
1)27)5471Kil
6ZNF3I7 K211
K 14
IZNF8IBA k 560 ILZNF178k II
65 h_1F
mi561(716.7170624
ILZI200II K If
k 4
L SZNF765KIIiml— L0C350421 il
52115578 12iioi OZNF32O k 12 L
pZNF72O k 12 L
I iio hFU16287 625C03C.IfL27L K22
ii 52115766 k TUPL473 k lU
L 5COCL745768 N
I Lirr SZNFIS7 5
3J UZM°8315I omr 721)7010 K21
SZNFK16 K21mBCO433Q1 K 22
mZlp7l9 K Rudonl
______________
cL0C611692 621)7C7)L12771 K TU
CLOC424341 637UZNF577 65
d_062491432 1
5ZNF432 K 17525F4C2 K 6pZF)F7i.I kilUZNF6I4 kil
CLOCUI6U9 612UZNFGI3 Kitp2110613 K 2
r SZNF6I5 KOp2lIF715 K17- UZNF3SO 65
LOC484338 K SI)
miF02115649 K III
52150644 k lU
1L0c4:r251 koSZNFI75 619
Outgroup
h P m r eI 1 0 0 10
Group I
h P m r e30 12 0 0 3
h p m r e
$7315
0hk649_h0hk175_3
32_4656701_5
137_3137_2137_4L41 l_5
cSk
Xfln
C51L677_3chkOiS_4
— oh6175_2
L»og- ,ooiflc gain
Croup H
l’niilalc-siioriUc gailI
Croup 111
7670
Page 95
Figure 5
Phylogenetic analysis ofC2H2-ZNF genes in cluster 19.12 ofhuman and its syntenic
counterparts in other mammals.
A phylogenetic tree was built using the amino acid sequences corresponding to the zinc
finger regions of the various human C2H2-ZNF from cluster 19.12 and their syntenic
counterparts in chirnpanzee, mouse, rat and dog. The tree was generated using a maximum
Ïikelihood method (RaxML) and verified using a bayesian method (Mr.Bayes). 346 sites
from 101 sequences (including the 20 outgroup sequences from chicken and Xenopus) were
used in the analysis. The tree is divided into three major Groups (I-III). A tabulation of the
number of genes present in each group is indicated for cadi species (h: human, p:
chimpanzee, rn: mouse, r: rat, e: dog. lie bootstraps values are indicated for each node on
the tree. A small black circle is also represented at each node in cases where the posterior
probability value is equal to 1.00. Tus cluster contains only C2H2-ZNF genes that are
either from tic KRAB subfamily or that do not encode any conserved N-terminal domain.
Next to the name of each C2H2-ZNF gene, the presence of an N-terminal KRAB dornain is
indicated by a K and number of zinc finger motifs is mentioned. A clear evidence of
differential expansion is seen in primates and dog. Loss of C2H2-ZNF in the rodent lineage
is also observed.
77
Page 96
8LU
—z-q.
L(X’6I1692RCD13095(.411(0433811%W3562511NF175
Zîp719
l93J405K0’HB
COU-184311ZXI’6491ZNF577D
L0C491432L0C468981IIZNF649(133(131861)
10(611669ZXF6137X0613
1,0(4833311ZNF300
Il7X0615I1X0615
Ni’lilZNF6I3
7X0132)•j7X0332-
-
-T-TEE 7NF61617X0616
—I—1.0(456261014162117
(M
l.0C7111971)7X0766(M •
____
ZNF3#Ofl7X0380D(L0C368984)7X0610
L0C468985)ZN0028
L0C366908ZX0534
TÇC740568[DIL0C356421ft1NF578
I(7X01108)
L7X0781
L0U3669NlA7X01377X003D
‘11.0(7186071.0(72903))
Il1X0611Ï—.
7X0600U7X0608
(C0C456267(jJ7X0211OO7X04611
7XF3207NF320-
(U)C456268)iJ1,0(300559
(7X0160)7X0816
(L0C356426)ZX0782E (U)C456269)D)C0U456270)D
7X0161)
1X0115
7X0347M
ZX0665—.
11X1810)D1.0(41113317X0677
1.0(376394
(0(9)664
ZX0525EE
7X0765
ZX0168A7XF761ED
—ZXF8L0E-
(M—UT(0(611599
1,0(460782
L0C611590
1.0(611583
p1,0(184329
(0(484326
1,0(481323
L0C484323
j7X0331ZXF33IZX0331
Page 97
Figure 6
Physical maps showing the organization ofthe human C2112-ZNF from cluster 19.12
localized on 19q13.4 and its syntenïcally homologous counterparts in other mammals.
For the large C2H2-ZNF cluster 19.12 and its syntenically homologous counterparts in
chirnpanzee, mouse, rat and dog, each C2H2-ZNf genes is represented by an open arrow
which indicates its orientation on the chromosome strands; this exciudes the pseudogenes
whose name appears in parenthesis. For these clusters which contain only C2H2-ZNF that
are from the KRAB subfamily or that do flot encode any conserved N-terminal domain, the
presence of a conserved N-terminal KRAB dornain is indicated by as square positioned in
front ofthe open arrow representing the gene. Genes identified as orthologs, based on the
phylogenetic tree and physical rnaps, are underlined and are aligned verticaÏly on their
respective chromosomes. Dotted lines separate the genes belonging to Group I, Group II
and Group III defined in the phylogenetic tree (figure 5). The two species specific groups
from dog and primates are seen in Group I and Group II, respectively.
79
Page 98
Figure 7: Variation in the numbers of zinc finger motifs in mammals and in the
presence of conserved N-terminal domains in orthologs
•Human Chimpanze Mouse JRat uDog
r -
ii 1H F ii a [ r
Ail lRAll 5C ‘. - (RAIl %( lii li I lIt)lE I) l)
C2H2-ZNF subfamilies
B
(--
Humati
___________________
2 —
E E
Chimpanzee
Mouse
---------------N R N N N
N N N N N
Rat
Dog
CHuman C2H2-ZNF from clusters
SCA-KRAll(II/14)
SCAN(16/29)
80
Page 99
figure 7
Variation in the numbers of zinc finger motifs in mammals ami in the presence of
conserved N-terminal domains in orthologs.
A) The average number of zinc finger motifs was calcuiated for ail the C2H2-ZNF from the
$1 hurnan clusters identified and their corresponding syntenically liomologous clusters in
the other mammals; for each species, the average number for the total C2H2-ZNF (Ail) and
for members ofthe various C2H2-ZNF sub-families (KRAB, SCAN, SCAN-KRA3, 3TB,
HOMEO, SET and, NONE no conserved domain associated) is presented. For each
categoiy, the number of genes in each species is listed above the bars in the following order
(human, chimpanzee, mouse, rat and dog).
B) for the human C2H2-ZNF cluster 6.2 (chromosome 6p22. 1) and its syntenically
homologous counterparts in chirnpanzee, mouse, rat and dog, each C2H2-ZNF genes is
represented by an open arrow whicli indicates its orientation on the chromosome strands;
this exciudes the pseudogenes whose name appears in parenthesis. For these clusters which
contain C2H2-ZNF that are from the KRAB or SCAN subfamily or that do not encode any
conserved N-terminal domain, the presence ofa conserved N-terminal is indicatcd by as
square for a KRAB domain or an open circle for a SCAN dornain both being positioned in
front of the open arrow representing the gene. Genes identified as orthoÏogs, based on the
phylogenetic tree and physical maps, are aiigned vertically on their respective
chromosomes. Cases wliere domain shuffling was observed among orthoÏogs from the
different mammals are marked by a grey box.
$1
Page 100
C) Exon-Intron organization of most hurnan C2H2-ZNF from the SCAN-KRAB and
SCAN subfarnilies. 80% ofthe human SCAN-KRAB C2H2-ZNf (11/14) and 55% ofthe
SCAN C2H2-ZNF (16/29) found in clusters have the presented exon-intron structures
shown. The exons encoding the SCAN, KRAB (A box) and ZNF are indicated.
82
Page 101
Figure 8: Model for the evolution of the SCAN, SCAN-KRAB and KRAB C2H2-ZNF
subfamilies
AC2112-ZNf SCAN-C2H2-ZNf SCAN-KRAB C2H2-ZNF KRÀB C2H2-ZNF
Gain afSCAN event
Gtii,, ofKRAB avent
E1Loss cfSCAN
B
F RAB ZNF NF
Vcrtehrate-spccftîc
Tetrapod-specific
Dup Dup Dup
2Singular gain events
Gain of SCAN Gain ofKRABZNF SCAN-ZNF SCAN-KRABZNF 4
Ï L55OfKB] OMultiple loss cventst Loss of SCAN
KRAB-ZNF
Lois of SCAN-KRAB
______
Liii ofKRAB
83
Page 102
Figure 8
Model for the evolution of the SCAN, SCAN-KRAB and KRAB C2112-ZNF
subfamilies
A) Sequential events of exon shuffling leading to the birth of SCAN-C2H2-ZNF and
SCAN-KRAB C2H2-ZN}’ subfamilies. Most of the SCAN-C2H2-ZNF and the SCAN
KRAB C2H2 ZNF have the exon-intron structure shown (boxes represent exons). Birth of
new families may have occurred by an exon shuffling mechanism leading presumably first
to the acquisition of a SCAN domain by a C2H2-ZNF and later of a KRAB domain by a
SCAN-C2H2-ZNF. Most SCAN-KRAB C2H2-ZNF have a single exon placed in between
the exon encoding the KRAB A box (identified as KRAB) and the exon encoding the zinc
finger domain (ZNF). This exon encodes in most instances the so-called KRAB B, b, or C
boxes.
B) Dynamic evolution of C2H2-ZNF after birth of the SCAN and $CAN-KRAB
subfamilies through gene duplication and recurrent loss of effector domains. A first SCAN
C2H2-ZNF appeared in an ancestor of vertebrates following the gain of a SCAN domain by
a C2H2-ZNf (in grey box); duplication then led to the establishment of the SCAN C2H2-
ZNF subfamily. The gain of a KRAB domain at the emergence of tetrapods by a SCAN
C2H2-ZNF gave rise to a SCAN-KRAB C2H2-ZNf (in grey box). This was followed by
duplication and establishment of the SCAN-KRAB subfamily. Loss of SCAN domain by
deletion or degeneration from some SCAN-KRAB C2H2-ZNF genes followed in many
instances by duplication led to the expansion of the KRAB C2H2-ZNF. Duplication and
loss of SCAN or KRAB domains by deletion or degeneration from SCAN, $CAN-KRÀB
84
Page 103
and KRAB C2H2-ZNF subfamilies are seen as a recurrent theme shaping the repertoires
ofthe C2H2-ZNF subfamilies.
85
Page 104
98Totalno.ofhumanC2H2-ZNFgenes
,.oor)oooOOOO
rrI
10
40
80-t
200
400
-t
600
800
1000—.
-t1400
1800(s—.
oCI)
3000
w
5000CI)
7000
9000—J
oc11000
13000t
15000
17000
20000
22000
27000C
35000
Page 105
Supplementary Figure 1: Distribution of intergenic distances between 718 C2112-
ZNF in the human genome.
Supplementary Figure 1
Distribution of intergenic distances between 71$ C2H2-ZNF in the human genome.
The intergenic distances between the consecutive C2H2-ZNF on each chromosome was
calculated for each C2H2-ZNF ofthe human genome. For the 718 C2H2-ZNF, the number
of C2H2-ZNF found within the range of intergenic distances indicated on the x axis is
plotted on the y axis. For example, there are 108 C2H2-ZNF within 10 to 20 Kb from a
consecutive C2H2-ZNF.
$7
Page 106
Supplementary Figure 2: Comparison of the number ofC2H2-ZNF genes in the 40
human clusters containing at Ieast 3 C2H2-ZNF ami their syntenic counterparts in
four other mammals
Supplementary Figure 2A
9-
8
Human 1Chimpanzee DMouse Rat DDog,w 7 -
__________
ew
6e
£
w 5ie£w
4Nc’1
ii IL k b1,2 2,2 3,1 3,2 4,1 7,2 7,3 7,6 8,3 9,1 10,2 10,3 15,2 15,3 17,1 18,1 19,2 19,10 20,2 X.1
Syntenically homologous C2H2-ZNF clusters with 3-5 C2H2-ZNF in Human
88
Page 107
Supplementary Figure 2B
80
70
Human S Chimpanzee D Mouse Rat D Dog60
(n
50
40
uzN
30(.4oqo 20
:Lh,hk_idb_Lii, L1,5 3,3 5,1 6,2 7,4 7,5 7,7 8,4 10,1 12,1 16,1 16,3 19,5 19,6 19.7 19,8 19,9 19,11 19.12 19,13
Syntenically homologous C2H2-ZNF clusters with at Ieast 6 C2H2-ZNF in Human
89
Page 108
Supplementary Figure 2
Comparison of the number of C2H2-ZNF genes in the 40 human clusters containing
at least 3 C2H2-ZNF and their syntenic counterparts in four other mammals.
For each human C2H2-ZNF cluster named of the graph, the first number indicates the
chromosome number and the second is the number attributed to that cluster on the
chromosome. C2H2-ZNF chisters with six or more (A) and three to five (B) genes in
human and their syntenic counterparts Chimpanzee, Mouse, Rat and Dog. This figure
provides evidence of C2H2-ZNF differential species-specific expansion and gene loss in
rodents.
90
Page 109
DD
Su
pp
lem
enta
ryT
able
SI
Co
mp
reh
ensi
ve
cata
logue
oft
he
718
C2H
2-Z
NF
gen
esin
the
hum
ang
eno
me.
Chr
1P
osi
tion2
Clu
ster
3P
seudo4
Nam
e5D
escr
ipti
on6
Dom
ain7
F8o
L1°
Star
f11
Stop
12
11p
36.3
11
p3
6.2
l-
11p
36.2
-p36
.1-
I1p
36-
11p36.l
Ili
a
11p
36.1
11
.lb
11p35.l
-
11p
34.3
-
11p
34.2
1.2a
11p
34.2
1.2b
11p
34.2
1.2c
11p
34.2
-
11p34.l
1.3a
11p
34.1
1.3b
I1p
32.3
-
I1p
22.2
-
11p
22-
11q
22-
11q
25.1
-
11q
25.3
-
11q
31.1
-
11q
31.2
-
11q
31.3
-
11q
32.1
-
11q
42.1
31.
4a
I1q
42.1
31.
4b
11q
43-
11q
44-q
ter
-
11q
44i.
5a
11q
441
5b
11q
441.
5c
11q
441.
5d
HK
R3
GL
I-K
rupp
elfa
mily
mem
ber
HK
R3
BIB
11+
688
6562
698
6571
926
PRD
M2
Ret
inob
last
oma
prot
ein
inte
ract
ing
ZN
FS
ET
7+
1718
1390
3937
1402
4162
ZB
TB
J7Z
inc
fing
eran
dB
IBdo
mai
nco
ntai
ning
17B
IB13
-80
316
1409
5116
1751
01
ZN
F43
6Z
inc
fing
erpr
otei
n43
6K
RA
B12
-47
023
5590
5523
5674
66
ZN
F59
3Z
inc
fing
erpr
otei
n59
3-
1+
116
2636
1096
2636
9951
ZN
F68
3Z
inc
fing
erpr
otei
n68
3-
4-
509
2656
0712
2657
1853
ZB
IB8
Zin
cli
nger
and
BIB
dom
ain
cont
aini
ng8
BIB
2+
512
3277
7359
3284
4129
ZN
F31
Zin
cfi
nger
prot
ein
31(K
0X29
)SC
AN
10+
977
3371
0846
3373
4582
ZN
F64
3Z
inc
fing
erpr
otei
n64
3K
RA
B9
+43
240
6883
6640
7019
39
ZN
F64
2Z
inc
fing
erpr
otei
n64
2K
RA
B9
+50
540
7158
8940
7346
02
ZN
F68
4Z
inc
fing
erpr
otei
n68
4K
RA
B8
+37
840
7698
2040
7864
25
ZN
F691
Zin
cfi
nger
prot
ein
691
-7
+28
443
0848
6743
0907
35
ZN
F39
3Z
inc
fing
erpr
otei
n39
3-
3÷
389
4435
7109
4437
3399
4L
OC
J28
208
sim
ilar
todJ
675G
8.1
(nov
elzi
ncfi
nger
prot
ein)
--
-45
2725
8645
6863
66
GL
IS1
GL
ISfa
mily
zinc
fing
er1
-5
-62
053
7444
9453
9724
65
ZN
F64
4Z
inc
fing
erpr
otei
n64
4-
3-
1327
9115
3443
9126
0259
GFI
IZ
inc
fing
erpr
otei
nG
fi-1
-6
-42
292
7129
0992
7250
21
ZB
TB
7BZ
inc
fing
erpt
otei
nan
dB
IBdo
mai
n7B
BIB
4+
539
1532
5354
815
3256
078
4JZ
BT
B37
Zin
cfi
nger
and
BIB
dom
ain
cont
aini
ng37
BIB
-+
361
1721
0415
517
2109
404
ZN
F64
8Z
inc
fing
erpr
otei
n64
8-
10-
568
1802
9032
818
0297
470
4)L
0C44
1918
sim
ilar
toZ
incf
ing
erp
rote
in13
2-
-+
-18
3273
244
1832
8010
94)
L0C
391
146
sim
ilar
tozi
ncfi
nger
prot
ein
101
--
--
1896
9432
118
9695
464
ZB
TB
41Z
inc
fing
eran
dB
TB
dom
ain
cont
aini
ng41
BIB
14-
909
1953
8943
719
5436
295
ZN
F281
Zin
cfi
nger
prot
ein
281
-4
-89
519
8642
043
1986
4578
9
ZN
F67
8Z
inc
fing
erpr
otei
n67
8-
15+
525
2258
1786
722
5910
754
Gm
l27
Sim
ilar
tozi
ncfi
nger
prot
ein
ZF
PK
RA
B7
-71
422
5951
873
2259
6102
3
L0C
441
927
sim
ilar
tozi
ncfi
nger
prot
ein
532
-2
-17
924
0540
626
2405
4681
9
ZN
F23
8Z
inc
ling
erpr
otei
n23
8B
IB4
+53
124
2281
208
2422
8740
1
4)Z
NF
695
Zin
cfi
nger
prot
ein
695
KR
AB
--
133
2451
7548
724
5237
946
ZN
F67O
Zin
cfi
nger
prot
ein
670
KR
AB
9-
389
2452
6671
024
5308
692
ZN
F66
9Z
inc
fing
erpr
otei
n66
9K
RA
B9
-46
424
5329
916
2453
3425
1
ZN
F12
4Z
inc
fing
erpr
otei
n12
4(H
ZF-
16)
KR
AB
7-
289
2453
8582
624
5401
941
91T
adep
afly
et.a
I
Page 110
)
I1
q4
41.
5e
11
q4
41
5f
11q
441.
6a
11
q4
41.
6b
22p25
-
22p23.3
2.l
a
22p23.3
2.l
b
22
pl6
.l-
22p
13.2
-p13
.1-
22p13
-
22q
11.1
2.2a
22q11.2
2.2b
22q
11.1
2.2c
22q13
2.3e
22q13
2.3b
22q14
-
22q21.2
-
22
q3
1.2
-q3
1.3
-
22q32
-
22
q3
4-q
35
-
22q34
-
33
p2
4.3
-
33p
22.1
3.l
a
33p
22.1
3.l
b
33p22.l
3.l
c
33p
22.1
32e
33p
22.1
3.2b
33p
22.1
3.2c
33p
21.3
23.
3a
33
p2
1.3
23.
3b
33
p2
2.3
-p2
l.l
3.3c
33p21.3
23.
3d
33p
213.
3e
33p
22-p
213
3f
33p
21.3
13
3g
L0C
7298
06si
mil
arto
Zin
cfi
nger
prot
ein
492
KR
AB
10-
741
ZN
F49
6Z
inc
fing
erpr
otei
n49
6SC
AN
-KR
AB
5-
249
ZN
F67
2Z
inc
fing
erpr
otei
n67
2-
13+
452
ZN
F69
2Z
inc
fin9
erpr
otei
n69
2-
5-
519
KL
FI1
Kw
ppel
-Iik
efa
ctor
11-
3+
512
ZN
F51
3Z
inc
fing
erpr
otei
n51
3-
7-
541
ZN
F51
2Z
inc
fing
erpr
otei
n51
2-
2+
567
BC
L1lA
B-c
eIl
CL
ulym
phom
a1J
A(z
inc
fing
erpr
otei
n)-
3-
779
YZ
NF
638
Zin
cfi
nger
prot
ein
636
--
+-
EG
R4
Ear
lygr
owth
tesp
on
se4
-3
-48
6
ZN
F51
4Z
inc
fing
erpr
otei
n51
4K
RA
B7
-40
0
ZN
F2Z
inc
fing
erpr
otei
n2(
A1-
5)K
RA
B9
+42
5
L0C
3440
65S
imil
arto
zinc
fing
erpr
otei
n13
5-
10+
524
WL
0C34
3938
Sim
ilar
tozi
ncfi
nger
prot
eins
32-
-+
-
WL
0C44
2041
Sim
ilar
tozi
ncfi
nger
prot
ein
532
--
--
GL
I2G
LI-
Kru
ppel
fam
ilym
embe
rG
LI2
-4
+12
58
L0C
4420
49S
imil
arto
zinc
fing
erpr
otei
n28
5K
RA
B8
+65
3
ZN
F53
3Z
inc
fing
erpr
otei
n53
3-
4-
471
KL
F7K
rupp
el-I
ike
fact
or7
-3
-30
2
ZN
F14
2Z
inc
fing
erpr
otei
n14
2-
18-
1524
ZN
FN1A
2Z
inc
fing
etpr
otei
nsu
bfam
i!y
lA,
2(H
elio
s)-
4-
526
L0C
3890
99S
imil
acto
zinc
fing
erpr
otei
n53
3-
1-
145
ZN
F61
9Z
inc
fing
erpr
otei
n61
9-
10+
371
ZN
F62O
Zin
cfi
nger
prot
ein
620
KR
AB
8+
422
ZN
F621
Zin
cfi
nger
prot
ein
621
KR
AB
7+
439
ZN
F651
Zin
cfi
nger
prot
ein
651
-8
+37
1
ZN
F66
2Z
inc
fing
erpr
otei
n66
2K
RA
B8
+42
6
4)L
0C33
9903
Sim
ilar
tozi
ncfi
nger
prot
ein
621
KR
A6
-+
128
ZN
F44
5Z
inc
fing
erpr
otei
n44
5SC
AN
-KR
AB
14-
1031
L0C
2853
46S
imil
arto
zinc
fing
erpr
otei
nZ
FP1
KR
AB
12-
544
ZN
F16
7Z
incf
inge
rpro
tein
167
SCA
N-K
RA
B13
+75
4
ZN
F66O
Zin
cfi
nger
prot
ein
660
-10
+33
1
ZN
F197
Zin
cfi
nger
prot
ein
197
SCA
N-K
RA
B22
+10
29
ZN
F35
Zin
cfi
nger
prot
ein
35-
11+
519
ZN
F5O
2Z
inc
fing
erpr
otei
n50
2-
14+
544
2454
194
99
2455
3024
5
2470
9915
3
2471
1062
8
1010
1133
2745
3606
2765
9397
6053
1806
7141
2397
7337
157
0
9517
7127
9519
4910
9523
6998
1101
0959
0
1105
5280
7
1212
6632
7
1448
3435
8
1800
1495
4
2076
5377
4
2192
1088
3
2135
7958
9
2215
0080
4049
3641
4052
2534
4054
1508
4267
5678
4292
2406
4295
3061
4445
7410
4451
2201
4457
1717
4460
1460
4464
1515
4466
5259
4472
9142
2454
3044
9
2455
6166
8
2471
1033
7
2471
1989
4
1011
2414
2745
7097
2769
9467
6063
4137
7151
5697
7337
4118
9518
8990
9521
3792
9524
5078
1101
2573
7
1105
6903
1
1214
6632
1
1449
7284
6
1804
3431
2
2077
3885
9
2192
3259
9
2137
2330
3
2218
6054
4050
4881
4053
4042
4055
6047
4268
4076
4293
4136
4295
9286
4449
4166
4451
6816
4459
9979
4461
2561
4466
4967
4467
7280
4474
0327
92T
adep
aHy
et.a
I
Page 111
33p2l.
3l
33p2l
33
pI4
.2
33
pl2
.3
33p11.l
33q
12.3
33p
12-q
ter
33q
13.2
33q
21
33q
13-q
21
33q
24
33q
24
33
q2
63
2
33q
26.3
-q27
33q
27
33
q2
9
44pl6
.3
44pI6
.3
44p
16.3
44p
16.3
44p
16.3
44p
16.1
44p
14-p
15.1
44
q1
2
44q
31.1
-q31
.2
44q
35.2
5p1
5.33
5p15
.1
5p13
.3
5p12
5p1
l-p
12
5q13
.2
5q35
.3
5q33
.1
5q35
.2
ZN
F5O
1
ZN
F58
9
ZN
F31
2
ZN
F71
7
ZN
F65
4
ZB
IB1I
ZN
F8O
ZB
TB
20
ZN
F14
8
KL
F15
ZIC
4
Zic
i
ZN
F63
9
WIG
1
BC
L6
4)L
0C28
5388
ZN
F59
5
MG
C26
356
YL
0C65
4254
ZN
F141
ZN
F5O
9
L0C
441
007
KL
F3
RE
SI
ZN
F33O
Zfp
42
ZF
P62
ZN
F62
2
ZF
R
WL
0C44
2134
ZN
F131
ZN
F36
6
EG
R1
ZN
F30
0
7NF
346
Zin
cfi
nger
prot
ein
501
Zin
cfi
nger
prot
ein
589
Zin
cfi
nger
prot
ein
312
Zin
cfi
nger
prot
ein
717
Zin
cfi
nger
prot
ein
654
Zin
cfi
nger
and
BT
Bdo
mai
nco
ntai
ning
11
Zin
cfi
nger
prot
ein
80
Zin
cfi
nger
and
BIB
dom
ain
cont
aini
ng20
Zin
cfi
nger
prot
ein
148
Kw
ppel
-lik
efa
ctor
15
Zic
fam
ilym
embe
r4
Zic
fam
ilym
embe
rI
Zin
cfi
nger
prot
ein
639
P53
targ
etzi
ncfi
nger
prot
ein
Zin
cfi
nger
prot
ein
51
Sim
ilar
tozi
ncfi
nger
prot
ein
161
Zin
cfi
nger
prot
ein
595
Sim
ilar
tozi
ncfi
nger
prot
ein
595
Sim
ilar
tozi
ncfi
nger
prot
ein
595
Zin
cfi
nger
prot
ein
141
Zin
cfi
nger
prot
ein
509
Sim
ilar
tozi
ncfi
nger
ptot
ein
596
Kru
ppel
-Iik
efa
ctor
3
RE
J-s
ilen
cing
tran
scri
ptio
nfa
ctor
Zin
cfi
nger
prot
ein
330
Zin
cfi
nger
prot
ein
42
Zin
cfi
nger
prot
ein
62ho
mol
og
Zin
cfi
nger
prot
ein
622
Zin
cfi
nger
RN
Abi
ndin
gpr
otei
n
sim
ilar
tozi
ncfi
nger
prot
ein
35
Zin
cfi
nger
prot
ein
131
Zin
cfi
nger
prot
ein
366
Zin
cfi
nger
prot
ein
225
Zïn
cfi
nger
prot
ein
300
Zin
cli
nger
prot
ein
346
9+
262
4÷
421
6-
459
19-
907
1+
299
12-
1053
7-
273
5-
668
4-
794
3-
416
4-
334
4+
447
5+
485
2-
288
6-
706
18+
648
6+
274
11+
474
7+
765
--
178
3+
345
6+
1097
-+
320
3+
310
j
3.3h
3.4a
3.4b
3.5a
3.5b
3.6a
3.6b
4.l
a
4.l
b
4.l
c
4.l
d
KR
AB
KR
AB
BT
B
BIB
BT
B
KR
AB
KR
AB
BTB
KR
AB
KR
AB
4474
6128
4825
7649
6233
0399
7564
1490
8827
1165
1028
5097
8
1154
3616
8
1155
4020
7
1264
3360
9
1275
4416
8
1485
8652
7
1486
0987
1
1805
2424
5
1802
2419
6
1669
2185
9
1943
5594
7
4322
7
1964
18
2528
40
3216
17
4342
879
9048
008
3834
2212
5746
8799
1423
6154
7
1891
5391
9
1266
7
1650
4628
3239
0213
4246
0563
4315
7399
7177
4990
1378
2908
0
1502
5415
7
1763
8230
3
4475
3579
4828
7484
6233
4061
7591
7386
8827
3912
1026
7860
7
1154
3911
5
1163
4881
7
1265
7678
1
1275
5892
6
1486
0709
7
1486
1719
6
1805
3601
4
1802
7227
8
1889
4616
9
1943
5746
0
7809
9
2397
48
3012
65
3590
47
4374
410
9087
407
3837
6796
5749
3097
1423
7530
1
1891
6319
3
2574
1
1651
8894
3248
0601
4250
4182
4321
1593
7183
9005
1378
3290
3
1502
6458
4
1764
2636
4
5 5 5 5 5 5 5 5
16-
498
--
477
-
-10
74
4+
510
11-
744
3+
543
12-
604
4+
294
93T
adep
ally
et.a
I
Page 112
DD
1780
7112
8
1782
1960
7
1782
5552
2
1783
0083
0
1763
8341
2
1764
2021
3
1780
9030
9
1782
4402
1
1782
9281
6
1783
2604
0
1783
9465
5
1784
4030
1
55q
35.3
5.l
aZ
NF
354A
Zin
cli
nger
prot
ein
354A
KR
AB
13-
605
55q
35.3
5.l
bZ
NF
354B
Zin
cli
nger
prot
ein
354B
KR
AB
13+
612
55q
35.3
5.l
cZ
FP
2Z
inc
ling
erpr
otei
n2
hom
olog
-13
+46
1
55q
35.3
5.l
dZ
NF
454
Zin
cfin
ger
pro
tein
454
KR
AB
12+
522
55q
35.3
51e
DK
FZ
p686
E24
33si
mil
arto
hypo
thet
ical
ptot
ein
9630
041
N07
KR
AB
13+
685
55q
35.2
51f
ZN
F35
4CZ
inc
fing
erpr
otei
n35
4CK
RA
B11
+55
4
66p
22.1
=Z
NF
322A
Zin
cli
nger
prot
ein
322A
-Il
-40
2
66
p2
l.3
-Y
ZN
F2O
4Z
inc
ling
erpr
otei
n20
4-
--
-
66p
22.1
6.l
aL
0C34
6157
Sim
ilar
todJ
153G
14.3
(nov
elC
2H2
type
ZN
F)-
9+
407
66p2l.
36.l
bZ
NF
184
Zin
cfi
nger
prot
ein
184
KR
AB
19-
751
66p2l.
36.
2aZ
NF
165
Zin
cfi
nger
prot
ein
165
SCA
N6
+48
5
66p
22.1
6.2b
ZN
F43
5Z
inc
fing
erpr
otei
n43
5SC
AN
4+
348
66p
21.3
6.2c
ZN
FI9
2Z
inc
fing
erpr
otei
n19
2SC
AN
-KR
AB
9+
578
66p
22.1
6.2d
YL
0C22
2701
Sim
ilar
tozi
ncli
nger
prot
ein
192
--
+26
9
66p
21.3
62e
ZN
F19
3Z
inc
ling
erpr
otei
n19
3SC
AN
5+
394
66
p2
l.3
3-p
21
.31
62
fZ
NF3
OZ
Zin
cli
nger
prot
ein
307
SCA
N-K
RA
B7
-54
5
66p2l.
3l
62
gZ
NF
187
Zin
cfi
nger
prot
ein
187
-8
+32
5
66p
21.3
16.
2hZ
NF
323
Zin
cfi
nger
prot
ein
323
SCA
N6
-40
6
66p
22.1
6.2i
ZN
F3O
6Z
inc
fing
erpr
otei
n30
6SC
AN
7+
538
66p
22.2
-p21
.36.
2jZ
NF3
O5
Zin
cfi
nger
prot
ein
305
SCA
N11
-60
4
66p22.l
6.2k
YZ
NF
452
Zin
cfin
ger
pro
tein
452
SCA
N-
-13
25
66p
22.1
6.21
ZN
F31
1Z
inc
ling
erpr
otei
n31
1K
RA
B14
-57
4
66p
22.1
-Z
FP
57Z
inc
ling
erpr
otei
n57
hom
olog
KR
AB
7-
508
66p
21.3
3-
ZB
TB
12Z
inc
fing
eran
dB
TB
dom
ain
cont
aini
ng12
BT
B3
-45
9
66p
21.3
6.3a
ZN
F29
7Z
inc
fing
erpr
otei
n29
7B
TB
2-
634
66p
21.3
26.
3bZ
BIB
9Z
inc
fing
eran
dB
TB
dom
ain
cont
aini
ng9
BIB
1+
473
66p
21.3
-p21
.2-
ZN
F76
Zin
cfi
nger
prot
ein
76-
7+
570
66p
ter-
p12.
1-
ZN
F31
8Z
inc
fing
erpr
otei
n31
8-
--
2099
66p
12.1
-Z
NF
45I
Zin
cli
nger
prot
ein
451
-5
+10
13
66
q1
5-
ZN
F29
2Z
inc
fing
erpr
otei
n29
2-
11+
2895
66q
21-
YL
0C44
2240
sim
ilar
tozi
ncfi
nger
prot
ein
259
--
--
66q
21-
ZB
TB
24Z
inc
fing
eran
dB
TB
dom
ain
cont
aini
ng24
(ZN
F45O
)B
TB
8-
697
66q
25-
PLA
G1
Zin
cli
nger
prot
ein
PLA
GL
1-
7-
463
66q
25.1
-Z
BIB
2Z
inc
fing
eran
dB
TB
dom
ain
cont
aini
ng2
BIB
3-
514
2674
4497
2676
7741
2743
3581
2744
7283
2747
4095
2747
7205
2752
6506
2754
8858
2815
4551
2816
5320
2820
0366
2820
5836
2621
7695
2823
3215
2623
7530
2824
5348
2630
1049
2830
9239
2832
0469
2832
7961
2834
2872
2835
3960
2840
0493
2842
9951
2842
5738
2844
2503
2845
4973
2847
5487
2864
7386
2866
3091
2907
0573
2908
1016
2974
8239
2975
6866
3197
5373
3197
7748
3339
0173
3339
3472
3353
0334
3353
3299
3533
5488
3537
1738
4341
1786
4344
5159
5701
9470
5714
3057
8792
1986
8803
0633
1092
1358
510
9214
949
1098
9041
210
9911
133
1443
0313
214
4371
236
1517
7736
515
1804
791
77p
22.1
L0C
441
192
Sim
ilar
10zi
ncfi
nger
prot
ein
162
-38
514
9341
1149
5746
7
94T
adep
ally
et.a
I
Page 113
D
77p22.i
77
p2
2.i
77
p2
2.i
77
p2
2.l
D
7.i
a
7.i
b
7.2a
7.2b
77
p2
2.i
7.2c
77p
22.1
7.2U
77p13
-
77p13-p
il.i
-
77p11.2
-
77p11.2
7.3a
77p11.2
7.3b
77p11.2
7.3c
77
pi
I.2-
pi1.
17.
3d
77
p1
1.i
7.3e
77qii
.21
7.4a
77q
11.2
17.
4b
77q
11.2
17.
4c
77q11.2
7.4d
77
q1
1.2
1-q
ii.2
37.
4e
77q
11.2
17.
4f
77q11.2
7.4g
77qli
.21
7.4h
77qii
.2i
7.4i
77
q2
2.i
7.5a
77q
22.1
7.5b
77q22
7.5c
77
q2
2.i
7.5d
77
q2
2.i
7.5e
77q
21.3
-q22
.17
5f
77
q2
2.i
75g
77
q2
2.i
7.5h
77q
22.1
7.6a
77
q2
2.i
7.6b
77
q2
2.i
7.6c
77q3i.
i-
77q3i.
32
-
77
q3
6.i
7.7a
L0C
441
193
Sim
ilar
tozi
ncfi
nger
prot
ein
469
--
-30
3
ZN
F81
5Z
inc
fing
erpr
otei
n81
5K
RA
B3
+51
7
DK
FZ
p434
JIO
I5hy
poth
etic
alpr
otei
nD
KF
Zp4
34J1
OI5
-5
+56
3
DK
FZ
p547
K05
4hy
poth
etic
alpr
otei
nD
KF
Zp5
47K
054
KR
AB
15+
1393
L0C
4422
83S
imil
arto
zinc
fing
erpr
otei
n11
b-
-+
-
ZN
F32
5Z
inc
ling
erpr
otei
n32
5(Z
NF1
2)K
RA
B15
-69
7
GL
I3G
LI-
kwpp
elfa
mily
mem
ber
GL
I3-
5-
1580
ZN
FN1A
1Z
inc
fing
erpr
otei
n,su
bfam
ily
lA,
I-
4+
519
ZN
F71
3Z
inc
fing
erpr
otei
n71
3K
RA
B6
+43
0
L0C
4423
11
Sim
ilar
tozi
ncfi
nger
prot
ein
43-
12+
392
4’L
0C22
2032
Sim
ilar
tozi
ncfi
nger
prot
ein
208
--
--
ZN
F47
9Z
inc
ling
erpr
otei
n47
9K
RA
G10
-87
8
4’L
0C34
0223
Sim
ilar
tozi
ncfi
nger
prot
ein
479
--
--
ZN
F71
6Z
inc
ling
erpr
otei
n71
6K
RA
B12
+49
5
ZN
F67
9Z
inc
fing
erpt
otei
n67
9K
RA
B9
+41
1
L0C
7289
27S
imil
arto
zinc
fing
erpr
otei
n92
KR
AB
9+
438
ZN
F68O
Zin
cfi
nger
prot
ein
680
KR
AB
12-
530
ZF
D25
Zin
cli
nger
prot
ein
(ZFD
25)
(ZN
F588
)-
24+
783
ZN
F13
8Z
inc
ling
erpr
otei
n13
6-
6+
262
ZN
F27
3Z
inc
ling
erpr
otei
n27
3K
RA
B13
÷50
4
ZN
F11
7Z
inc
fing
etpr
otei
n11
7K
RA
B9
-38
3
H-p
lKK
rupp
el-r
elat
edzi
ncfi
nger
prot
ein
-13
-48
3
ZN
F92
Zin
cli
nger
prot
ein
92K
RA
B14
+58
6
ZN
F78
9Z
inc
ling
erpr
otei
n78
9K
RA
B8
+42
5
ZN
F39
4Z
inc
fing
erpr
otei
n39
4SC
AN
-KR
AB
7-
561
ZF
P95
Zin
cli
nger
prot
ein
95ho
mol
og(m
ouse
)SC
AN
-KR
AB
13+
639
ZN
F65
5Z
inc
fing
erpr
otei
n65
5-
6+
491
ZN
F49
8Z
inc
fing
erpr
otei
n49
8-
7+
544
ZK
SCA
N1
Zin
cli
nger
prot
ein
36K
RA
B6
+56
3
ZN
F38
Zin
cfi
nger
prot
ein
38K
RA
B8
+47
3
ZN
F3Z
inc
fing
erpr
otei
n3
KR
AB
8-
410
L0C
6436
41H
ypot
heti
cal
prot
ein
L0C
6436
41K
RA
B-K
RA
B-
+11
63
L0C
6497
46H
ypot
heti
cal
prot
ein
L0C
6497
46K
RA
B3
+34
5
L0C
5676
41H
ypot
heti
cal
prot
ein
L0C
5676
41K
RA
B4
+26
7
ZN
F27
7Z
inc
fing
erpr
otei
n(C
2H2
type
)27
7-
2+
436
FEZ
F1Z
inc
fing
erpr
otei
nFE
Z-
5-
471
ZN
F78
6Z
inc
fing
erpr
otei
n78
6K
RA
B15
-78
2
5426
140
5829
273
6428
035
6435
920
6700
537
6696
844
4197
0196
5041
1724
5575
4540
5630
8323
5666
1026
5719
1263
5732
0482
5750
1998
6334
6841
6342
9463
6361
7697
6376
3946
6389
2241
6400
1101
6407
5795
6408
9025
6447
6203
9890
8451
9892
8790
9894
0232
9899
3981
9905
2507
9945
1155
9948
5353
9951
7266
9950
0406
9980
0106
1000
0110
6
1116
3367
9
1217
2928
7
1483
9751
3
5428
044
5853
913
6437
162
6467
482
6713
023
6713
094
4224
1712
5043
8053
5578
2642
5630
9501
5670
8371
5721
1513
5734
3922
5753
3597
6336
4744
6344
6960
6366
0923
6380
8839
6393
1140
6402
8773
6408
8849
6409
0719
6450
3433
9892
3153
9893
5813
9896
9380
9901
2012
9906
7976
9947
3339
9950
0599
9949
9406
9950
0106
9980
1106
1000
0310
6
1117
7032
0
1217
3183
5
1484
1872
0
95T
adep
ally
et.a
I
Page 114
)
8p23
.3
8p23.l
8p23.l
8p23.l
6p23.l
8p21
.1
8q11
.1
8q21
.11
8q1
3-q2
1.1
8q
22
.2
8q
22
.2
8q
23
8q24
.12
8q24
.12
8q24
.13
8q24
.13
8q24
.13
8q24
.22
8q24
.3
8q24
.3
8q24
.3
8q24
.3
8q24
.3
8q24
.3
8q
24
.3
8q
24
.3
6q
24
ZN
F59
6
L0C
2832
02
ZN
F7O
5Bq)
ZN
F7O
5C
L0C
441
341
ZN
F39
541
L0C
3922
15
ZFH
X4
ZB
TB
1O
L0C
4423
92
KLF
1O
ZF
PM
2
TR
PS
14)
L0C
3922
64
ZH
X2
ZHX
1
ZN
F57
2
ZN
F4O
6
ZF
P4I
GL
I4
ZN
F69
6
ZN
F62
3
ZN
F7O
7
ZN
F251
ZN
F34
ZN
F51
7
ZN
F7
Zin
cfi
nger
prot
ein
596
Sim
ilar
tozi
ncfi
nger
prot
ein
75
Zin
cfi
nger
prot
ein
705B
Zin
cfi
nger
prot
ein
705C
Sim
ilar
tozi
ncfi
nger
prot
ein
10
Zin
cfi
nger
prot
ein
395
Sim
ilar
tozi
ncli
nger
prot
ein
92
Zin
cfi
nger
hom
eodo
mai
n4
Zin
cfi
nger
and
BT
Bdo
mai
nco
ntai
ning
70
Sim
ilar
tozi
ncli
nger
prot
ein
317
Kru
ppet
-Iik
efa
ctor
10
zinc
fing
erpr
otei
n,m
ultit
ype
2
Zin
cfi
nger
tran
scri
ptio
nfa
ctor
TR
PS
1
Sim
ilar
tozi
ncfi
nger
prot
ein
532
Zin
cfi
nger
san
dho
meo
boxe
s2
Zin
cfi
nger
san
dho
meo
boxe
s1
Zin
cfi
nger
prot
ein
572
Zin
cfi
nger
prot
ein
406
Zin
cfi
nger
prot
ein
41ho
mol
og
GL
t-K
rupp
elfa
mily
mem
ber
GL
I4
Zin
cfi
nger
prot
ein
696
Zin
cli
nger
prot
ein
623
Zin
cfi
nger
prot
ein
707
Zin
cfi
nger
prot
ein
251
Zin
cfi
nger
prot
ein
34
Zin
cfi
nger
prot
ein
517
Zin
cfi
nger
prol
ein
7
19-
752
8+
642
5+
671
4+
495
4+
546
9-
831
4-
644
12-
595
12-
595
11+
504
--
178
-
+30
0
-+
-
-+
301
1-
513
-+
-
7+
3577
2+
847
3+
293
3-
480
2+
1151
1-
1281
1+
837
1-
873
12+
529
13-
1243
4+
198
7+
376
9+
374
13+
536
7+
369
7-
293
12-
549
10+
492
14+
686
1723
8218
7339
7188
906
7226
546
7821
316
7856
224
1223
7465
1226
3633
1221
4635
1226
1425
2825
9021
2829
9896
4781
5662
4781
6648
7777
8835
7794
0711
8156
1003
8159
5322
9457
6479
9472
8501
1037
3018
810
3737
128
1064
0032
310
6885
943
1164
8990
011
6750
402
1205
6169
612
0612
050
1238
6308
212
4055
936
1243
2987
712
4355
728
1260
5473
312
6060
809
1355
5921
313
5794
463
1444
0048
414
4416
250
1444
2098
214
4430
476
1444
4497
114
4451
539
1448
0297
314
4809
731
1448
3865
014
4849
515
1459
4814
014
5952
607
1459
6930
914
5981
498
1459
9506
514
6006
265
1460
2380
714
6039
409
77q
36.1
77q
36.1
77q
36.1
77q
36.1
77q
36.1
77q
36.1
77q
36.1
77q
36.1
77q
36.1
ZN
F42
5
ZN
F39
8
ZN
F28
2
ZN
F21
2
ZN
F78
3
ZN
F77
7
ZN
F74
6
ZN
F76
7
ZN
F46
7
Zin
cli
nger
prot
ein
425
Zin
cfi
nger
prot
ein
398
Zin
cfi
nger
prot
ein
282
Zin
cfi
nger
prot
ein
212
Zin
cfi
nger
prot
ein
783
Zin
cfi
nger
prot
ein
777
Zin
cfi
nger
prot
ein
746
Zin
cfi
nger
prot
ein
767
Zin
cfi
nger
prot
ein
467
1484
3080
9
1484
5444
1
1485
5319
9
1485
6770
7
1485
9019
5
1487
5939
4
1488
0081
8
1488
7517
8
1490
9238
5
1484
5431
1
1485
1105
2
1485
5426
7
1485
8363
0
1486
2532
5
1487
8330
6
1488
2572
7
1489
5275
7
1491
0122
8
7.7b
7.7c
7.7d
7.7e
7-7f
77g
7.7h
7.71
7.7j
8.l
a
8.l
b
8.2a
8.2b
8.3a
8.3b
8.3c
8.3d
8.3e
8.4a
8.4b
8.4c
8.4d
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 B 8 8
KR
AB
KR
AB
KR
AB
KR
AB
KR
AB
KR
AB
KR
AB
KR
AB
KR
AB
KR
AB
HO
ME
O-4
BT
B
HO
ME
O-3
HO
ME
O-4
KR
AB
KR
AB
KR
AB
KR
AB
96T
adep
ally
etal
Page 115
DD
D
88q24.3
88q24.3
88q24.3
99p24.2
99p
13.1
99p13.2
99
p1
2
99
p1
2
99q
22.3
1
99
q2
2.3
2
99
q2
2.3
2
99
q2
2.3
3
9q
22
.33
99
q2
2.3
3
99q
22-q
31
99q31.2
99q
31
99
q3
1.3
99
q3
2
99
q3
2
99
q3
2
99
q3
3.2
99q33.2
99q
33-q
34
99
q3
4
99
1010
p15
10lO
pll
.2
10lO
pli
1010
p11
10lO
pll
.21
10lO
pIl
.2
10lO
pIl
.2
10lO
qll
.21
10lO
qll
.21
8.4e
84f
84g
9.l
a
9.l
b
9.lc
9.l
d
9.2a
9.2b
9.33
9.3b
9.4a
9.4b
10
.la
bib
10.l
c
laid
101e
101f
10.2
a
10.2
b
ZN
F64
7Z
inc
fing
erpt
otei
n64
7(Z
NF2
5O)
KR
AB
13-
560
1460
7733
714
6097
632
ZN
F16
Zin
cfi
nger
prot
ein
16-
17-
682
1461
2654
814
6147
078
L0C
6429
14S
imil
arto
zincf
inger
pro
tein
135
KR
AB
10-
335
1461
7260
314
6196
311
GL
IS3
GL
ISfa
mily
zinc
fing
er3
-6
-77
538
1767
641
4218
3
4)L
0C65
3501
Sim
ilar
tozi
ncfi
nger
prot
ein
658
--
--
3943
3814
3935
4471
ZB
TB
5Z
inc
fing
eran
dB
TB
dom
ain
cont
aini
ng5
BT
B2
-67
737
4281
1137
4553
96
ZN
F65
8Z
inc
fing
erpr
otei
n65
8K
RA
B21
-10
5940
7614
1240
7820
63
ZN
F65
8BZ
inc
fing
erpr
otei
n65
8B-
21-
819
4157
8833
4158
2207
ZN
F48
4Z
inc
fing
erpr
otei
n48
4K
RA
B18
-81
694
6481
7294
6801
11
ZN
F16
9Z
inc
fing
erpr
otei
n16
9K
RA
B13
+60
396
0808
6196
1035
45
ZN
F36
7Z
inc
fing
erpr
otei
n36
7-
2-
350
9819
0057
9622
0490
ZN
F51O
Zin
cfi
nger
prot
ein
510
KR
AB
10-
683
9855
7968
9858
0149
ZN
F78
2Z
inc
fing
erpr
otei
n78
2K
RA
B14
-69
998
6190
9498
6562
10
ZN
F32
2BZ
inc
fing
erpr
otei
n32
28-
11-
402
9899
9358
9900
1731
ZN
F18
9Z
inc
fing
etpr
otei
n18
9K
RA
B16
+62
610
3200
984
1032
1276
3
ZN
F46
2Z
inc
fing
erpr
otei
n46
2-
9+
2506
1086
6519
910
8813
628
KL
F4K
wpp
el-l
ike
fact
or4
-3
-47
010
9286
956
1092
9157
6
ZN
F48
3Z
inc
fing
erpr
otei
n48
3SC
AN
-KR
AB
-+
256
1133
2726
011
3379
945
L0C
1698
34hy
poth
etic
alpt
otei
nL
0C16
9834
-13
-53
011
4799
221
1148
1429
3
ZF
P37
Zin
cfi
nger
prot
ein
37ho
mol
og(m
ouse
)K
RA
B12
-63
011
4843
995
1148
5881
7
ZN
F61
8Z
inc
fing
erpr
otei
n61
8-
4+
861
1156
7838
011
5852
293
ZN
F48
2Z
inc
fing
erpr
otei
n48
2B
TB
4-
424
1247
1015
012
4715
430
ZB
IB26
Zin
cfi
nger
and
BIB
dom
ain
cont
aini
ng26
BT
B4
-44
112
4720
199
1247
3360
0
ZN
F29
7BZ
inc
fing
erpr
otei
n29
7BB
TB
3÷
467
1286
0718
212
8637
318
ZN
F79
Zin
cfi
nger
prot
ein
79K
RA
B11
+49
812
8662
244
1266
8797
9
ZB
TB
34Z
inc
fing
eran
dB
TB
dom
ain
cont
aini
ng34
BT
B3
+53
212
9226
482
1292
4747
1
KL
F6K
wpp
el-
like
fact
or6
-3
-28
338
0818
838
1745
5
ZN
F24
8Z
inc
fing
erpr
otei
n24
8K
RA
B8
-57
938
1579
0538
1864
92
4)B
A77
5A3.
1K
RA
Bbo
xzi
ncfi
nger
pseu
doge
ne-
-+
-38
2125
5336
2130
31
4)B
A39
3]16
.4Z
inc
fing
erpse
udogen
e-
-+
-38
2235
2638
2256
69
ZN
F25
Zin
cfi
nger
prot
ein
25K
RA
B12
-45
638
2788
0138
3007
44
ZN
F33
AZ
inc
fing
erpr
otei
n33
aK
RA
B16
+81
038
3412
7638
3883
10
ZN
F37
AZ
inc
fing
erpr
otei
n37
aK
RA
B12
+56
138
4232
8138
4522
86
L0C
4016
42S
imil
arto
zinc
fing
ecpr
otei
n91
-16
-59
742
1515
3442
1727
13
ZN
F37B
Zin
cfi
nger
prot
ein
37b
KR
AB
8-
525
4236
7418
4236
8286
97T
adep
ally
et.a
I
Page 116
10lO
qll
.2
10lO
qll
.21
10lO
qll
.22-q
ll.2
3
10lO
qll
.21
1010
q22-
q25
10lO
qil
10lO
qll
.22
1010
q21.
2
1010
q22.
2
1010
q22.
3
1010
q24.
1
1010
q26
1010
q26.
3
llp
l5.5
I1p
15.4
llp
l5A
11p1
5.4
llp
l4.3
11p1
3
llpll
.2
11q1
2
11q1
2.2
11q1
2.3
11q1
3.4
11q2
3.1
11q2
3.3
11q2
4.3
12l2
p12
1212
p13.
31
1212
q13
1212
q13.
11
1212
q13.
13
1212
q13
1212
q13.
2-q1
3.3
ZN
F1I
B
ZN
F48
7
ZN
F23
9
ZN
F48
5
ZN
F32
ZN
F22
ZN
F48
8
ZN
F36
5
ZN
F5O
3
4)L
0C39
9783
ZN
F5J
8
ZN
FN
IA5
ZN
F51
1
ZN
F19
5
ZN
F21
5
ZN
F2J
4
ZN
F14
3
L0C
341
002
4)Z
NF
ZN
F4O
8
ZFP
91
ZF
P91
-CN
TF
ZB
TB
3
L0C
4400
53
ZN
F75
C
ZB
TB
16
ZN
F2O
2
ZB
TB
15
ZN
F38
4
ZN
F7O
5A4)
ZN
F75B
ZN
F64
J
ZN
F38
5
ZN
FN1
A4
Zin
cfi
nger
prot
ein
11b
Zin
cfi
nger
prot
ein
487
Zin
cfi
nger
prot
ein
239
Zin
cfi
nger
prot
ein
485
Zin
cli
nger
prot
ein
32
Zin
cfi
nger
prot
ein
22
Zin
cfi
nger
prot
ein
488
Zin
cli
nger
prot
ein
365
Zin
cli
nger
prot
ein
503
Sim
ilar
tozi
ncfi
nger
prot
ein
532
Zin
cfi
nger
prot
ein
518
zinc
ling
erpr
otei
n,su
bfam
ily
lA,
5
zinc
fing
erpr
otei
n51
1
zinc
fing
erpr
otei
n19
5
zinc
fing
erpr
otei
n21
5
zinc
fing
erpr
otei
n21
4
zinc
fing
erpr
otei
n14
3(c
lone
pHZ
-1)
sim
ilar
todJ
568F
9.1
(zin
cli
nger
prot
ein
133
Kw
ppel
like
zinc
ling
erpr
otei
n
zinc
fing
erpr
otei
n40
8
zinc
fing
erpr
otei
n91
hom
olog
(mou
se)
zinc
fing
erpr
otei
n91
hom
olog
(mou
se),
CN
F
zinc
ling
eran
dB
IBdo
mai
nco
ntai
ning
3
sim
ilar
tozi
ncfi
nger
prot
ein
596
Zin
cfi
nger
prot
ein
75C
zinc
fing
eran
dB
TB
dom
ain
cont
aini
ng16
zinc
ling
erpr
otei
n20
2
BT
BfP
OZ
)do
mai
nco
ntai
ning
15
zinc
fing
erpr
otei
n38
4
Zin
cfi
nger
prot
ein
705A
zinc
fing
erpr
otei
n75
b
Zin
cfi
nger
prot
ein
641
zinc
fing
erpr
otei
n38
5
zinc
fing
erpr
otei
n,su
bfam
ily
lA,
4
Gli
oma-
asso
ciat
edon
coge
neho
mol
og
16-
778
3+
421
9-
458
11+
402
7-
273
5÷
224
2+
340
-+
462
1-
646
-+
-
4+
1483
4-
420
3+
252
10-
557
4+
517
11-
606
7÷
626
15+
653
10+
720
4+
570
-
+52
9
BTB
2-
574
KR
AB
--
178
390
BTB
9+
673
SCA
N-K
RA
B8
-64
8
BT
B2
+53
9
6-
516
5+
300
5-
438
3-
366
4+
544
4240
4561
4245
3998
4325
2288
4329
8636
4337
1801
4338
3913
4342
1881
4343
3358
4345
9313
4346
4332
4481
5928
4482
0780
4797
5095
4799
3872
6380
3957
6410
1777
7682
7915
7683
1431
7914
9361
7916
4538
9787
9494
9791
2480
1247
4195
512
4758
311
1349
7241
313
4976
656
DD
10.2
c
10.3
a
10.3
b
10
3c
10.3
d
11.l
a
11.l
b
11.2
a
11.2
b
KR
AB
KR
AB
KR
AB
KR
AB
SCA
N-K
RA
B
KR
AB
KR
AB
11 11 11 11 11 il 11 11 11 H1
11 11 11 11 11
3336
572
6904
230
6977
125
9439
089
2376
7378
3236
5900
4667
8944
5810
3225
5810
3225
6227
5011
7119
6287
7777
4116
1134
3565
9
1231
0020
7
1296
0178
9
6645
904
8216
417
4268
7843
4702
2179
5304
9187
5470
4791
3356
891
6935
854
6998
117
9506
188
2376
8482
3241
3653
4668
4037
5814
5091
5814
9778
6227
8190
7123
2113
7777
4385
1136
2660
8
1231
1757
3
1296
8971
7
6668
930
8223
909
4269
0385
4703
0844
5306
4748
5471
7887
GLI
KR
AB
KR
AB
5+
1106
1561
4020
156
1523
12
98T
adep
ally
et.a
I
Page 117
NIIIIHI VIII INr r r r r r r r
INW rrwflr
r r r r r r r r I I I0 1% r
+ .+++ I++ + 11+1 II I
-fl0 r• r r r r C r CI CI CI q q r r Ø
I
I9III I’IiiI’9uni’ iuv’uiii; ,-n
m.sin I!gIIWaII. I++ I + 1+11 I+++++
riNflCCt.. CICr .
I I I I I I I I I I I I.111111.
Ilillili“Ii”
IIj)I
IIII
‘liii11111
I ILr
jII r
CI,
110 0
lilitIfil IIIIIItiIiiiHHuI IIIflIpI)
iIh sIINhr
3.
•nut•%..o anan en en enr r r r r r r r r r r q q r rWclrJcJcJcleJ .Scdnn 44 qq
I rrrrrrr rrrr I I I rr I rr I I I I I I I rrrrrrr
qCI
rr r r qqrlqq % r I rr
Ihnn Hill nflflihr r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r
cflCdtfl CICICICICI qqqqqqtqtq 00000000ccor r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r
Page 118
ZN
F77
4
ZN
F59
8
ZN
F2O
6
ZN
F2O
5
ZN
F21
3
ZN
F20
0
ZN
F26
3
ZN
F75
A
ZN
F43
4
ZN
F17
4
ZN
F59
7
GL
IS2
ZN
F50
0
ZN
F69
4
ZN
F55
3
ZN
F76
8
ZN
F74
7
ZN
F76
4
ZN
F68
8
ZN
F78
5
ZN
F68
9
ZN
F62
9
ZN
F66
8
ZN
F64
6
L0C
3424
26
ZN
F26
7
ZN
F42
3
ZN
F31
9
ZN
F23
ZN
F19
ZFP
1
ZN
F46
9
ZFP
M1
ZF
P27
6
ZF
P3
Zin
cfi
nger
prot
ein
774
zinc
fing
erpr
otei
n59
8
zinc
fing
etpr
otei
n20
6
zinc
fing
erpr
otei
n20
5
zinc
fing
erpr
otei
n21
3
zinc
fing
erpr
otei
n20
0
zinc
fing
erpr
otei
n26
3
Zin
cfi
nger
prot
ein
75a
Zin
cfi
nger
ptot
ein
434
zinc
fing
erpr
otei
n17
4
zinc
fing
erpr
otei
n59
7
GL
ISfa
mily
zinc
fing
et2
zinc
fing
erpr
otei
n50
0
Zin
cfi
nget
prot
ein
694
zinc
fing
erpr
otei
n55
3
Zin
cfi
nger
prot
ein
768
Zin
cfi
nger
prot
ein
747
Zin
cfi
nger
prot
ein
764
Zin
cfi
nger
prot
ein
688
Zin
cfi
nger
prot
ein
785
Zin
cfi
nger
prot
ein
HIT
-39
(ZN
F68
9)
zinc
fing
etpr
otei
n62
9
Zin
cli
nger
prot
ein
668
Zin
cfi
nger
prot
ein
646
sim
ilar
tozi
ncfi
nger
prot
ein
267
zinc
fing
erpr
otei
n26
7
zinc
fing
erpr
otei
n42
3
zinc
fing
erpr
otei
n31
9
Zin
cfi
nger
prot
ein
23(K
OX
16)
Zin
cfi
nger
prot
ein
19(K
OX
12)
zinc
fing
erpr
otei
n1
hom
olog
(mou
se)
zinc
fing
erpr
otei
n46
9
zinc
fing
erpr
otei
n,m
ultit
ype
1
ZN
F27
6ho
mol
og(m
ouse
)
zinc
fing
erpr
otei
n3
hom
olog
(mou
se)
-12
+48
3
--
-90
4
SCA
N14
-72
5
KR
AB
8f-
554
SCA
N-K
RA
B5
+45
9
-5
-39
5
SCA
N-K
RA
B9
+68
3
KR
AB
5+
296
-6
-48
5
SCA
N3
+40
7
-7
-42
4
-4+
524
SCA
N5
-48
0
SCA
N-K
RA
B6
-96
7
-12
+61
8
-10
-52
0
KR
AB
--
191
KR
AB
7-
408
KR
AB
2-
276
KR
AB
7-
405
KR
AB
11-
500
19-
1056
16-
619
29+
1832
13-
580
14+
743
23-
1284
15-
582
17-
643
10-
458
8+
352
3+
3446
2+
1004
4+
539
13+
502
4922
478
D
1515
q26.
115
.3c
D
8869
6546
8870
5719
1616
p13.
3
1616
p13.
3
1616pl3
.3
1616
p13.
3
1616
p13.
3
161
6p
l3.3
1616p13.l
l
1616
p13.
3
16iS
p13.3
1616
p13.
3
1616
p13.
3
1616
p13.
3
1616
p12.
1
1616
p11.
2
1616
p11.
2
16l6
pll
.2
1616pl1
.2
1616
p11.
2
1616
p11.
2
1616
p11.
2
16.l
a
16.1
b
16.1
c
16
.ld
16
1e
161f
161g
16.1
h
16h
16.2
a
16.2
b
16.3
a
16.3
b
16.3
c
16.3
d
16.3
e
16.3
f
163g
16.3
h
16.3
1
16.3
j
16.4
a
16.4
b
16.5
a
16.5
b
16.6
a
16.6
b
16 16 16 16 16 16 16 16 16 16 16 16 16
16p1l.
2
16p1
1.2
i6pll
.2
16p1
1.2
16p1
1.2
1 6q
12
1 6q
13
16q2
2
16q2
2
16q2
2.3
16q2
4
16q2
4.2
16q2
4.3
1987
769
3078
896
3102
607
3125
140
3212
343
3273
488
3295
485
3372
086
3391
245
3426
111
4322
226
4740
816
2515
4823
3031
4558
3044
2826
3045
0280
3047
2586
3048
8508
3049
9495
3052
2187
3069
7271
3097
9672
3099
3269
3152
0510
3163
2096
4808
2022
5658
6074
7003
9000
7006
5563
7373
9926
8702
1380
8704
7226
8831
4934
1999
764
3082
862
3110
519
3132
806
3225
410
3281
461
3308
575
3391
026
3399
365
3433
491
4327
803
4757
167
2517
6343
3031
8216
3044
5411
3045
3695
3047
7085
3049
1229
3050
4511
3052
9183
3070
6024
3099
3005
3100
2334
3152
2350
3168
0365
4841
8419
5659
126
3
7005
3618
7008
0742
7376
3486
8703
4666
8712
8890
6633
3811
4940
393
KR
AB
KR
AB
KR
AB
17l7
p13.2
17.i
a
100
Tad
epal
lyet
.aI
Page 119
1717
p13-
p12
1717
p13
1717
p13.
1
17l7
pll
.2
1717
p11.
2
1717p1l.
2
1717
p11.
2
1717
q11.
2
1717
q12
1717
q21
1717
q21.
32
1717
q22
18pt
er-p
l1.
2
18p1
1.21
18q1
1.2
18q1
1.2
18q1
2.2
18q1
2
18q1
2
18q1
2
18q2
1.1
18q2
1.32
18q2
3
18q2
3
18q2
2-q2
3
18q2
3
ZN
F23
2
ZN
F59
4
ZB
TB
4
ZN
FI8
ZN
F28
6
ZN
F28
7
ZN
F62
4
ZN
F2O
7
ZN
F4O
3
ZN
FN1A
3
ZN
F65
2
ZN
F161
ZF
P16
I
ZN
F51
9
L0C
4418
16
ZN
F521
ZN
F39
7
ZN
F271
ZN
F24
ZN
F39
6
APM
-1
ZN
F53
2
ZN
F4O
7
ZN
F51
6
ZN
F23
6
SAL
L3
KL
F16
BT
BD
2
ZN
F55
4
ZN
F55
5
ZN
F55
6
ZN
F57
ZN
F77
KIA
A1
086
ZB
TB
7A
zinc
fing
erpr
otei
n23
2
zinc
ling
erpr
otei
n59
4
zinc
fing
eran
dB
TB
dom
ain
cont
aini
ng4
zinc
ling
erpr
otei
n18
(KO
X11
)
zinc
fing
erpr
otei
n28
6
zinc
ling
erpr
otei
n28
7
Zin
cfi
nger
prot
ein
624
zinc
ling
erpr
otei
n20
7
zinc
fing
erpr
otei
n40
3
zinc
ling
erpr
otei
n,su
bfam
ily
lA,
3
Zin
cli
nger
prot
ein
652
zinc
ling
erpr
otei
n16
1
zinc
ling
erpr
otei
n16
1ho
mol
og(m
ouse
)
zinc
fing
erpr
otei
n51
9
sim
ilar
tozi
ncli
nger
prot
ein
586
zinc
fing
erpr
otei
n52
1
zinc
fing
erpr
otei
n39
7
zinc
fing
erpr
otei
n27
1
zinc
ling
erpr
otei
n24
(KO
X17
)
zinc
ling
erpr
otei
n39
6
BT
B/P
OZ
-zin
cli
nger
prot
ein-
like
zinc
ling
erpr
otei
n53
2
Zin
cfi
nger
prot
ein
407
zinc
ling
erpr
otei
n51
6
zinc
ling
erpr
otei
n23
6
SaI-
like
3
Kru
ppel
-Iik
efa
ctor
16
BTB
dom
ain
cont
aini
ng2
Zin
cfi
nger
prot
ein
554
Zin
cfi
nger
prot
ein
555
Zin
cfi
nger
prot
ein
556
Zin
cli
nger
prot
ein
57
Zin
cfi
nger
prot
ein
77(p
11)
Sim
ilar
tozi
ncli
nger
prot
ein
Zin
cli
nger
and
BIB
dom
ain
cont
aini
ng7
SCA
N5
-44
4
-22
-11
20
BIB
6-
1013
SCA
N-K
RA
B5
-54
9
KR
AB
10+
521
SCA
N-K
RA
B14
-75
4
KR
AB
21-
865
2+
478
--
+69
7
-1
-50
9
-9
-60
6
-5
-52
1
BT
B5
-44
9
KR
AB
10-
540
--
-12
2
-24
-13
11
SCA
N-
+27
5
-5
+423
SCA
N4
-36
6
SCA
N2
-33
3
BIB
4-
766
-8
+13
01
-7
+10
01
-7
-28
52
-25
+1558’
-8
+13
00
D
18 18 18 18 18 18 18 18 18 18 18 18 16 18
17.l
b
17.l
c
f7.2
a
17.2
b
18.l
a
18
.lb
18.l
c
18
.ld
18.2
a
f8.2
b
19.l
aa
19
.lab
19.2
aa
19.2
ab
19.2
ac
19.2
ad
19.2
ae
19.3
aa
19.3
ab
4949
755
5023
554
7303
421
1182
1487
1554
4054
1639
5426
1646
4776
2741
4039
3197
4917
3517
4724
4472
7485
5340
3909
5279
379
1409
4724
2034
1044
2089
5889
3107
5017
3112
4298
3116
9957
3120
0659
4380
7731
5468
1041
7047
4282
7219
8625
7266
5104
7484
1263
1803
399
1936
447
2770
917
2792
482
2818
333
2851
964
2684
217
3755
010
3996
217
4967
121
5028
416
7328
241
1184
1414
1556
1963
1641
3189
1649
7883
2772
1583
3202
0391
3527
3967
4479
4883
4
5342
0614
5283
313
1412
2429
2034
1412
2118
6114
3109
2357
3114
2072
3117
8405
3121
1299
4382
1492
5480
4689
7076
2386
7229
6128
7281
1671
7485
9182
1814
496
1966
702
2786
469
2805
036
2829
501
2869
474
2895
930
3820
026
4017
816
1919
p13.
3
19l9
pl3
.3
1919
p13.
3
1919
p13.
3
1919
p13.
3
19l9
pl3
.3
19l9
p13.3
1919
p13.
3
19l9
p13.3
BT
B
KR
AB
KR
AB
KR
AB
KR
AB
KR
AB
BT
B
3-
252
--
525
7+
487
15+
626
9+
456
13+
555
12-
545
3-
939
4-
584
101
Tad
epal
lyet
.aI
Page 120
DD
1919pl3
.219
.4aa
191
9p
l3.3
-pl3
.219
.4ab
191
9p
l3.2
19.5
aa
1919
p13.
219
.5ab
1919
p13
19.5
ac
1919
p13.
219
.5ad
191
9p
l3.2
19.5
ae
1919
p13.
219
.5af
1919
p13.
219
.5ag
1919
p13.
219
.5ah
1919
p13.
219
.5ai
1919
p13.
219
.5aj
191
9p
l3.2
19.5
ak
1919
p13.
219
.5a1
1919
p13.
219
.5am
1919
p13.
219
.5an
1919
p13.
219
.6aa
1919
p13.
219
.Gab
1919
p13.
219
.6ac
1919
p13.
219
.6ad
191
9p
l3.2
19.6
ae
1919
p13.
219
.6af
1919
p13.
21
96
ag
19l9
pl3
.219
.6ah
1919
p13.
219
.6ai
19l9
p1
3.2
19.6
aj
19l9
pl3
.219
.6ak
1919
p13.
219
.6a1
19l9
p1
3.2
19.6
am
19l9
p1
3.2
19.6
an
1919
q13.
4319
.6ao
19l9
p13.3
-p13.2
19.6
ap
1919
p13.
219
.6aq
19l9
pl3
.2-p
13.1
219
.6ar
19l9
p1
3.2
19.6
as
19l9
p1
3.2
19.6
at
19l9
p1
3.2
19.6
au
ZN
F55
7Z
inc
fing
erpr
otei
n55
7K
RA
B10
+43
0
ZN
F35
8Z
inc
fing
erpr
otei
n35
8-
9+
481
ZN
F41
4Z
incf
ing
erp
rote
in4
l4-
1-
312
ZN
F55
8Z
inc
fing
erpr
otei
n55
8K
RA
B9
-40
2
ZN
F3I
7Z
inc
fing
etpr
otei
n31
7K
RA
B13
+59
5
ZN
F69
9Z
inc
fing
erpr
otei
n69
9K
RA
B16
-64
2
ZN
F55
9Z
inc
fing
erpr
otei
n55
9K
RA
B11
+53
8
ZN
F17
7Z
inc
fing
erpr
otei
n17
7K
RA
B7
+32
1
ZN
F26
6Z
inc
fing
erpr
otei
n26
6K
RA
B14
-54
9
ZN
F56O
Zin
cfi
nger
prot
ein
560
KR
AB
-KR
AB
14-
790
ZN
F42
6Z
inc
fing
erpr
otei
n42
6K
RA
B12
-55
4
ZN
FJ2
1Z
inc
fing
erpr
otei
n12
1-
10-
390
ZN
F56
IZ
inc
fing
erpr
otei
n56
1-
10-
417
ZN
F56
2Z
inc
fing
erpr
otei
n56
2-
9-
354
qi
L0C
7296
48S
imil
arto
zinc
fing
erpr
otei
n56
1-
--
-
L0C
1629
93H
ypot
heti
cal
prot
ein
L0C
1629
93K
RA
B12
-53
3
ZN
F65
3Z
inc
fing
etpr
otei
n65
3-
4-
615
ZN
F62
7Z
inc
fing
erpr
otei
n62
7K
RA
B11
+46
1
LO
C4O
189
8S
imil
arto
hypo
thet
ical
prot
ein
FL
J382
81-
6+
187
HS
ZF
P36
Zin
cfin
ger
pro
tein
ZF
P-3
6K
RA
B16
-61
0
ZN
F44
JZ
inc
fing
erpr
otei
n44
1-
19+
626
ZN
F49
JZ
incf
inger
pro
tein
49l
-13
+43
7
ZN
F44O
Zin
cfi
nger
prot
ein
440
KR
AB
12+
595
ZN
F43
9Z
inc
fing
erpr
otei
n43
9K
RA
B11
+49
9
ZN
F69
Zin
cfi
nger
prot
ein
69K
RA
B-
÷14
9
ZN
F70
0Z
inc
fing
erpr
otei
n70
0K
RA
B21
+74
2
ZN
F44O
LZ
inc
fing
erpr
otei
n44
0lik
eK
RA
B8
+39
7
ZN
F43
3Z
inc
fing
erpr
otei
n43
3K
RA
B19
-67
3
L0C
7297
47S
imil
arto
zinc
fing
erpr
otei
n70
9K
RA
B15
-57
8
FL
J149
59H
ypot
heti
cal
prot
ein
FL
]149
59K
RA
B8
+66
6
ZN
F78
8H
ypot
heti
cal
prot
ein
L0C
3885
07-
16+
615
ZN
F2O
Zin
cfi
nger
ptot
ein
20(K
OX
13)
KR
AB
13-
536
ZN
F62
5Z
inc
fing
erpr
otei
n62
5-
8-
306
ZN
F13
6Z
inc
fing
erpr
otei
n13
6K
RA
B14
+54
0
ZN
F44
Zin
cfi
nger
prot
ein
44(K
OX
7)K
RA
B16
-63
7
ZN
F56
3Z
inc
fing
erpr
otei
n56
3K
RA
B8
-47
6
ZN
F44
2Z
inc
fing
erpr
otei
n44
2K
RA
B14
-62
7
7020
721
7487
075
8485
032
8781
382
9112
073
9265
957
9295
928
9334
696
9384
272
9438
003
9499
683
9537
292
9580
131
9620
341
9661
814
9729
151
1145
5246
1156
9327
1161
1591
1169
3080
1173
8907
1177
0400
1180
1554
1183
7844
1185
9670
1189
6900
1193
6869
1198
6573
1201
5620
1203
6528
1206
4078
1210
3603
1211
6705
1213
4919
1221
9007
1228
9291
1232
1185
7034
589
7491
911
8482
224
8794
565
9135
084
9281
384
9315
521
9353
866
9407
234
9470
279
9510
303
9556
209
9592
899
9632
550
9667
794
9740
410
1147
7654
1159
0974
1162
4258
1171
0731
1175
4301
1178
0306
1180
6031
1184
1306
1188
6144
1192
2578
1195
2214
1199
0116
1202
8127
1204
9631
1208
6499
1211
2116
1212
8529
1216
1064
1226
6637
1230
5502
1233
7447
102
Tad
epal
lyet
.aI
Page 121
I
NNNNNNNNNr r r r r r r r r
r r r r r r r r r r r r r r r
r r r r r r r r N NNNNNNNNNCINNNr r r r r r r r r r r r r r r
• I I Il + I + I ++ I I +++ I ++ I I I +++++ • ++ I I I + I I +
eeeenr- O oe F-rCO q NNNC F-NNqNnr r r r fl r • C r r r r r • r r N fl r r r •N r r r r r fl r I n r r C e n
HH I • 1H •HH 1 1 • HHHH
I IIh h
iiiitiitIititItititI{tiitttttttiittff1HIHhkPHHkIHI I,,)J)))),,,,,99
c c c c h h c
I I I I1 I I I I 2 2 2!! 2 11IqiqMRwwq
3.
4fl11IIe e e e e e e e e e e e e e e e e e e e e e e e e e e e â e e e e er r r r r r r I I I r r r r r r r r r r r r r r r r r r r r r r r r r r r
?!! Ndr r N Nr Q Q r r r!tS t,r?r 9 9
NNNNNNr rrru3r r r rnSâNNNNNNNNNN.SNNNNN.N.INnNNN
r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r&&&&&&&&n&&&&&&&&&&&&&&&&&&&&&&&&&&r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeer r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r
Page 122
DD
D
1919
p12
19.7
az
1919
p12
19.7
az
1919
p12
19.7
az
1919
p12
19.7
az
‘19
19p1
219
.7az
1919p13.l
-p12
19.7
az
1919
p12
19,7
az
1919
p12
19.7
az
1919
p12
19.7
az
1919
p12
19.7
az
1919
p12
19.7
az
‘19
19p1
219
.7az
1919
p12
19.7
az
1919
q12
-
1919
q12
-
1919q1311
-
191
9q
13
.1l
19.8
aa
1919
q13.
1119
.8ab
1919
q13.
1119
.8ac
1919
q13.
1119
.8ad
1919
q13.
1119
.8ae
1919
q13.
1119
.8af
1919
q13.
1119
.8ag
1919
q13.
1119
.8ah
1919
q13.
119
.9aa
1919
q13.
1219
.9ab
1919
q13.
119
.9ac
1919
q13.
1219
.9ad
1919
q13.
1219.9
ae
1919
q13.
1219
.9af
1919
q13.
1219
.9ag
1919
q13.
131
99
ah
1919
q13.
1219
.9ai
1919
q13.
1219
.9aj
1919
q13.
1219
.9ak
1919
q13.
1219
.9a1
f19
q13.
121
9.9
am
ZN
F49
2Z
inc
fing
erpr
otei
n49
2K
RA
B28
+11
3222
6089
6622
6423
12
ZN
F99
Zin
cfi
nger
prot
ein
99K
RA
B30
-10
3622
7308
4722
7446
24
L0C
6468
64S
imil
arto
zinc
fing
erpr
otei
n43
0K
RA
B7
+65
122
7817
7422
8330
75
4)L
0C38
8523
Sim
ilar
toZ
inc
fing
erpr
otei
n20
8-
--
-22
9494
5122
9778
18
4)Z
NF
724P
Sim
ilar
tozi
ncfi
nger
prot
ein
43-
--
-23
1968
7323
2568
59
ZN
F91
Zin
cfi
nger
prot
ein
91(H
PF
7,H
TF
J0)
KR
AB
35-
1191
2333
3876
2337
0089
4)Z
NF
725
Zin
cfi
nger
prot
ein
725
--
--
2346
6157
2349
0947
ZN
F67
5Z
incf
inger
pro
tein
675
(hZ
)K
RA
B14
-56
823
6278
1223
6617
82
ZN
F68
IZ
inc
fing
erpr
otei
n68
1-
16-
576
2371
8000
2373
3479
L0C
6468
95S
imil
arto
zinc
fing
erpr
otei
n53
9-
6+
193
2380
6558
2380
7139
L0C
7300
84S
imil
arto
zinc
fing
erpr
otei
n53
9-
6+
171
2380
7338
2380
7853
4)L
0C73
0087
Sim
ilar
tozi
ncf
ing
erp
rote
in91
--
+-
2388
9628
2391
0116
ZN
F25
4Z
incf
ing
erp
rote
in2
54
KR
AB
4+
353
2406
1816
2410
3022
ZN
F53
6Z
inc
fing
erpr
otei
n53
6-
8+
1300
3555
5166
3574
0805
ZN
F53
7Z
inc
fing
erpr
otei
n53
7H
OM
EO
2-
1081
3645
7693
3646
2015
ZN
F5O
7Z
incf
inger
pro
tein
507
-5
+95
337
5283
9337
5704
13
L0C
4418
47S
imil
arto
zinc
fing
erpr
otei
n23
9-
9-
345
3981
7307
3981
8566
ZN
F3O
2Z
inc
fing
erpr
otei
n30
2K
RA
B7
+47
839
8604
3939
8691
37
ZN
F181
Zin
cfi
nger
prot
ein
181
(HH
Z18
1)K
RA
B11
+50
739
9169
3339
9256
13
ZN
F59
9Z
incf
inger
pro
tein
599
KR
AB
14-
588
3994
0819
3995
5960
L0C
6438
25S
imil
arto
zinc
fing
erpr
otei
n39
6SC
AN
-÷
417
3998
5078
4000
8745
L0C
441
848
Sim
ilar
tozi
ncfi
nger
prot
ein
113
-1
-58
740
0491
0140
0496
76
ZN
F3O
Zin
cfi
nger
prot
ein
30(K
OX
28)
-18
÷54
240
1097
2440
1279
12
ZN
F92
Zin
cfin
ger
pro
teïn
92-
13-
553
4013
9098
4014
3044
TZ
FP
Tes
tjs
zinc
fing
erpr
otei
nB
TB
2÷
487
4089
5670
4089
9780
ZN
F56
5Z
inc
fing
erpr
otei
n56
5K
RA
B12
-49
941
3648
8941
3847
92
ZN
F14
6Z
inc
fing
etpr
otei
n14
6-
10+
292
4141
1488
4142
1506
ZF
P14
Zin
cfï
nger
prot
ein
14-l
ike
KR
AB
13-
533
4155
0715
4151
9002
ZN
F54
5Z
incf
inger
pro
tein
545
KR
AB
13-
532
4157
4701
4160
1390
ZN
F56
6Z
inc
fing
erpr
otei
n56
6K
RA
B7
-41
841
6304
1541
6593
58
ZF
P26
OZ
incf
ing
erp
rote
in2
6o
-13
-41
241
6937
7041
7110
12
ZN
F52
9Z
inc
fing
erpr
otei
n52
9-
11-
458
4172
7130
4175
6030
ZN
F38
2Z
inc
fing
erpr
otei
n38
2K
RA
B10
+55
041
7880
6141
8113
39
GIO
T-1
Gon
adot
ropi
nin
duci
ble
TR
FK
RA
B12
-56
341
8201
2341
8495
79
ZN
F56
7Z
inc
fing
erpr
otei
n56
7K
RA
B15
+61
641
8721
4241
9040
66
L0C
3428
92H
ypot
heti
cal
prot
ein
L0C
3428
92K
RA
B32
-10
9041
9305
0941
9555
71
MG
C62
100
Hyp
othe
tica
lpr
otei
nL
0C38
8536
KR
AB
13-
636
4199
6710
4202
1121
Tad
epal
lyet
.aI
Page 123
I I +++++ I I ++
t-Or CCOCNr r r C n r r r r r
.LHHHHHHI1 iL •1
C
IrorSIU. nrij 3 rn- rfl
p j
tItIIIiItttiItItttIItIIIIIttttItIIItI111 I1)H)Hh11u1u)
C nro
uxiiniiuxs inooodddddooodo oooooooooooooododooooor r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r
nrnrV r
CINC4CIC4CUNN C4NNNNnC4 Na rrrr Yr rr r r r r r r r r r r r r r r r r r r tU tU tU tU (bi n q.deddd.5oi.SuSnr r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r rgggggggggggggggggggggggggggggggggggggr r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r
oooooooooooooooooooooooooooooooooooor r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r
IrrCCn*%flr flP4rrFJ1
gma
Cr0
ImdIiIIihhIhi++++ I I +++ I +++ I I I I I ++
C tU tU O n n r n (bi C r t- t- n 00r r r r tU N r r r r r IØ r r r n N r C SI
t-
ii’’’’INIII““n++++++
000*00r r r r r r
Page 124
DD
D
1919
q13.
3119.l
lap
ZN
F23
4Z
inc
fing
erpr
otei
n23
4K
RA
B19
+70
049
3376
184
93
54
12
7
1919q13.2
19.l
laq
ZN
F22
6Z
inc
fing
erpr
otei
n22
6K
RA
B19
+80
349
3610
894
93
73
67
8
1919q13.3
219.l
lar
ZN
F22
7Z
inc
fing
erpr
otei
n22
7K
RA
B19
+79
949
4085
314
94
33
26
0
1919
q13.
311
9.l
las
ZN
F23
3Z
inc
fing
erpr
otei
n23
3K
RA
B8
+67
049
4559
164
94
71
30
8
191
9q
13
.219.l
lat
ZN
F23
5Z
inc
fing
erpr
otei
n23
5K
RA
B16
-73
849
4624
404
95
01
01
0
1919
q13.
219
.11
auZ
NF
228
Zin
cfi
nger
prot
ein
228
KR
AB
17-
907
4952
2546
49
55
26
66
1919q13.3
219.l
lav
ZN
F28
5Z
inc
fing
etpr
otei
n28
5K
RA
B11
-59
049
5816
484
95
97
60
5
1919
q13.
3119
.1la
wP
L0C
1477
11
Sim
ilar
tozi
ncfi
nger
prot
ein
285
--
+-
4965
4547
4966
9406
191
9q
13
.219
.1la
xZ
NF1
8OZ
inc
fing
erpr
otei
n18
0(H
HZ
J68)
KR
AB
12-
692
4967
1701
49
69
63
95
191
9q
13
.2-
ZN
F34
2Z
inc
fing
erpr
otei
n34
2-
6-
475
5026
6599
5027
1528
1919q13.3
2-
ZN
F54
1Z
inc
fing
erpr
otei
n54
1-
2-
792
5271
5759
5273
9966
191
9q
13
.32
-Z
NF
114
Zin
cfin
ger
pro
tein
114
KR
AS
4+
417
5346
6466
5348
2675
1919q13.3
3-
ZN
F47
3Z
inc
fing
erpr
otei
n47
3K
RA
B20
+87
155
2210
2455
2438
45
191
9q
13
.419.l
2aa
ZN
F17
5Z
inc
fing
erpr
otei
n17
5K
RA
B15
+71
156
7663
4356
7848
03
1919
q13.
411
9.l
2ab
ZN
F57
7Z
inc
fing
erpr
otei
n57
7K
RA
B8
-47
857
0663
6557
0630
09
1919
q13.
4119.l
2ac
ZN
F64
9Z
inc
fing
erpr
otei
n64
9K
RA
B10
-50
557
1000
5957
0843
01
1919
q13.
411
9.l
2ad
4L
0C44
1861
Sim
ilar
tozi
ncf
ing
erp
rote
in64
--
--
5710
9726
5711
2852
1919
q13.
4119.l
2ae
ZN
F61
3Z
inc
fing
erpr
otei
n61
3K
RA
B12
+58
157
1225
0057
1408
17
1919
q13.
411
9.l
2af
ZN
F35O
Zin
cfin
ger
pro
tein
350
KR
AB
8-
532
5715
9406
5718
1880
1919
q13.
411
9.l
2ag
ZN
F61
5Z
incf
inger
pro
tein
615
KR
AB
19-
731
5718
6400
5720
3270
1919
q13.
411
9.l
2ah
ZN
F6J
4Z
inc
fing
erpr
otei
n61
4K
RA
B11
-58
557
2083
9157
2234
29
1919
q13.
411
9.l
2ai
ZN
F43
2Z
inc
fing
erpr
otei
n43
2K
RA
B17
-65
257
2284
9057
2438
85
1919
q13.
411
9.l
2aj
L0C
2843
71H
ypot
heti
cal
prot
ein
L0C
2843
71-
--
924
5725
9531
5729
0830
1919
q13.
411
9.l
2ak
ZN
F61
6Z
inc
fing
erpr
otei
n61
6K
RA
B21
-78
157
3089
6757
3350
03
1919
q13.
4119
.12a
1F
LJ1
6287
Sim
ilar
tozi
ncf
ing
erp
rote
in61
6K
RA
B25
-93
657
3499
3757
3665
56
1919
q13.
4119
.12a
mZ
NF
766
Zin
cfi
nger
prot
ein
766
KR
AB
10+
468
5746
4636
5748
7766
1919
q13.
411
9.l
2an
ZN
F48O
Zin
cfi
nger
prot
ein
480
KR
AB
12+
516
5749
2263
5752
0987
1919
q13.
411
9.l
2ao
ZN
F61O
Zin
cfi
nger
prot
ein
610
KR
AB
9+
462
5754
0494
5756
1923
1919q13
19
.l2
apZ
NF
528
Zin
cfi
nger
prot
ein
528
KR
AB
15+
628
5759
2933
5761
3469
1919
q13.
411
9.l
2aq
ZN
F53
4Z
inc
fing
erpr
otei
n53
4K
RA
B17
+67
457
6242
5257
6345
11
1919
q13.
4119.l
2ar
ZN
F57
8Z
inc
fing
erpr
otei
n57
8-
12+
365
5770
6025
5770
8794
1919
q13.
4119.l
2as
ZN
F8O
8S
imil
arto
zinc
fing
erpr
otei
n60
0-
2+
903
5773
8331
5775
0693
1919
q13.
4119.l
2at
ZN
F7O
1Z
inc
fing
erpr
otei
n70
1K
RA
B9
+46
557
7653
4057
7798
61
191
9q
13
.41
9.l
2au
ZN
F13
7Z
inc
fing
erpr
otei
n13
7-
5+
207
5779
1719
5779
5214
191
9q
13
.31
9.l
2av
ZN
F83
Zin
cfi
nger
prot
ein
83(H
PF
1)-
15-
516
5780
7443
5783
3450
1919
q13.
411
9.l
2aw
L0C
7298
40S
imil
arto
zin
cfin
ger
pro
tein
160
KR
AB
14-
626
5784
7611
5788
5574
19q1
3.41
19
.l2
axZ
NF
611
Zin
cfin
ger
pro
tein
611
KR
AB
17-
705
5789
9284
5792
4947
Tad
epal
lyet
.aI
Page 125
I
je§ I jr
ItttitIIIttItIItttttIIIttitIIiiIitittIIIIIIIIIIIHIHIHIIIJ IIHIH 1:1111
ai in ii i n nr r r r r r r r r r r r r r r r r r r r r r r r r r reeeeeeeeeeeeoœeeeeeooeemeeer r r r r r r r r r r r r r r r r r r r r r r r r r r
JflfflflflflOEflOEOEOEfl
rqrr rC4W
mmIIIIIIIIIIdNIVhHIIIUflorrr
000000eeee
R;8; 0
HHIIVIIjjIflhIIiiiIjiIII!IIiHIiiflflfl eeeeeeeeee
r e reece n ra r r
I I I I I I I I I I + I ++++++++ I I + I ++ I + I + I ++ I + I
oerNøe OrOO OParStCfl r 0t4eri r r r r r q LN r r N r N CI r r r r r e r q e n q v q e e e n e r r r
r r r r rdddddr r r r r
oeeeoeeooeo00ee000eeee0000000ee00000r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r
Page 126
DD
D
191
9q
13
.43
19.l
3as
191
9q
13
.43
19
.l3
at
191
9q
13
.43
19
.l3
au
191
9q
13
.41
9.l
3av
191
9q
13
.43
19.l
3aw
191
9q
13
.41
9.l
3ax
1919
q13.
41
9.l
3ay
1919
q13.
41
9.l
3az
191
9q
13
.41
9.l
3b
a
191
9q
13
.43
19.l
3bb
1919
q13.
41
9.l
3b
c
191
9q
13
.43
19.l
3bd
1919
q13.
41
9.l
3b
e
191
9q
13
43
19.l
3bf
191
9q
13
.43
19.l
3bg
1919
q13.
419.l
3bh
1919
q13.
419.l
3bi
191
9q
13
.43
19.l
3bj
191
9q
13
.43
19.l
3bk
191
9q
13
.43
19.1
3b1
191
9q
13
.43
19.l
3bm
191
9q
13
.43
19.l
3bn
1919
q13.
419.l
3bo
191
9q
13
.43
19.l
3bp
191
9q
13
.43
19.l
3bq
1919
q13.
41
9.l
3b
r
1919
q13.
41
9.l
3b
s
191
9q
13
.43
19.l
3bt
1919
q13.
4319.l
3bu
1919
q13.
419.l
3bv
1919
q13.
4319.l
3bw
191
9q
13
43
19
l3b
x
1919
q13.
4319.l
3by
1919
q13.
431
9.l
3b
z
191
9q
13
43
19l3
ca
1919
q13.
4319l3
cb
3f
19
q1
3.4
31913cc
ZN
F471
zinc
fing
erpr
otei
n47
1K
RA
B15
+62
6
ZF
P28
Zin
cfi
nger
prot
ein
28m
ouse
hom
olog
KR
AB
-KR
AB
15÷
868
ZN
F47O
Zin
cfi
nger
prot
ein
470
KR
AB
17+
717
ZN
F71
zinc
fing
erpr
otei
n71
(Cos
26)
-13
+48
9
BC
3729
5_3
Hyp
othe
tica
lpr
otei
nB
C37
295_
3-
14-
559
ZIM
2Z
inc
fing
er,
impr
inte
d2
KR
AB
5-
527
PE
G3
Pat
erna
lly
exp
ress
ed3
SCA
N12
-15
88
ZIM
3Z
inc
fing
er,
impr
inte
d3
KR
A8
11-
472
ZN
F26
4zi
ncfi
nger
prot
ein
264
KR
AB
13+
627
ZN
F8O
5Z
inc
fing
erpr
otei
n80
5K
RA
B11
+37
7
ZN
F27
2Z
inc
fing
erpr
otei
n27
2K
RA
B11
+56
2
ZN
F54
3Z
inc
fing
erpr
otei
n54
3K
RA
B13
+60
0
ZN
F3O
4Z
incf
ing
erp
rote
in30
4K
RA
B15
+65
9
ZN
F57
4Z
inc
fing
erpr
otei
n54
7K
RA
B9
÷40
2
ZN
F54
8Z
inc
fing
erpr
otei
n54
8K
RA
B11
÷53
3
ZN
F17
5Z
inc
fing
erpr
otei
n17
(KO
X10
)K
RA
B17
+66
2
ZN
F74
9Z
inc
fing
erpr
otei
n74
9-
17+
691
ZN
F77
2Z
inc
fing
erpr
otei
n77
2K
RA
B10
-48
9
ZN
F41
9Z
inc
fing
erpr
otei
n41
9K
RA
B11
+51
0
ZN
F77
3Z
inc
fing
erpr
otei
n77
3K
RA
B9
+44
2
ZN
F54
9Z
inc
fing
erpr
otei
n54
9K
RA
B15
+62
7
ZN
F55O
Zin
cfi
nger
prot
ein
550
KR
AB
8-
381
ZN
F41
6Z
inc
fing
erpr
otei
n41
6K
RA
B12
-59
4
ZIK
1Z
inc
fing
erpr
otei
nZI
K1
KR
AB
9+
487
ZN
F53O
Zin
cfi
nger
prot
ein
530
KR
AB
13+
599
ZN
F13
4Z
inc
fing
erpr
otei
n13
4-
10+
427
ZN
F21
1Z
inc
fing
erpr
otei
n21
1K
RA
B12
+56
4
ZS
CA
N4
Zin
cfi
nger
and
SCA
Ndo
mai
n4
SCA
N4
+43
3
ZN
F551
Zin
cfi
nger
prot
ein
551
KR
AB
16+
654
ZN
F15
4Z
inc
fing
erpr
otei
n15
4K
RA
B10
-44
9
ZN
F671
Zin
cfi
nger
prot
ein
671
KR
AB
10-
534
ZN
F77
6Z
inc
fing
erpr
otei
n77
6-
10+
476
ZN
F58
6Z
inc
fing
erpr
otei
n58
6K
RA
B10
+40
2
ZN
F55
2Z
inc
fing
erpr
otei
n55
2K
RA
B8
-40
7
ZN
F58
7Z
inc
fing
erpr
otei
n58
7K
RA
B13
+57
5
ZN
F81
4Z
inc
fing
erpr
otei
n81
4-
--
-
ZN
F41
7Z
inc
fing
erpr
otei
n41
7K
RA
B13
-57
5
6171
1024
6174
2129
6176
8362
6179
8504
6186
6765
6197
7742
6201
5615
6233
7276
6239
4681
6245
6615
6248
3745
6252
3689
6255
4487
6256
6691
6259
3030
6261
4359
6264
6590
6267
2766
6269
0945
6270
3121
6273
0505
6275
9537
6278
2055
6278
7440
6280
3065
6281
7440
6283
6396
6287
2115
6288
5217
6290
4779
6292
2931
6295
4023
6297
2850
6301
0264
6305
3081
6307
5442
6311
0026
6173
2082
61
75
99
82
6178
1931
6182
7362
6187
6058
6204
3887
6204
3876
6234
8382
6242
2351
6246
5479
6249
6618
6253
3956
6256
3078
6258
2739
6260
4598
6262
4983
6264
8665
6268
0750
6269
7860
6271
1338
6274
3943
6275
0155
6277
4746
6279
5570
6281
1444
6282
6536
6284
5946
6288
2317
6289
2991
6291
2391
6293
0795
6296
1337
6298
3757
6301
8093
6307
1765
6309
2226
6311
9756
qi
Tad
epal
lyet
.aI
Page 127
191
9q
13
.43
191
9q
13
.43
191
9q
13
.4
191
9q
13
.43
191
9q
13
.4
191
9q
13
.43
191
9q
13
.43
19l9
qte
r
191
9q
13
.43
191
9q
13
.43
191
9q
13
.43
191
9q
13
.43
191
9q
13
.43
191
9q
13
.43
191
9q
13
.4
191
9q
13
.43
191
9q
13
.43
191
9q
13
.43
191
9q
13
.43
191
9q
13
.2-q
13
.4
191
9q
13
.2
2020p13
2020p13
202
op
ter-
ql
1.23
2020p11.2
3-p
ll.2
2
202O
p12.3
-pll
.21
202O
p11.
21
202
0q
11
.21
2020q11.2
2
2020q11.1
-qll
.23
2020q11.2
1-q
13.1
2
2020q13.1
2
2020q13.1
2
2020q13.1
-q13.2
2020q13.1
3-q
13.2
ZJ
20q13.2
19
.l3
cd
19.l
3ce
19
.l3
cf
19
.l3
cg
19
.l3
ch
191.l
3ci
19.l
3cj
19.l
3ck
19.1
3d
19.1
3cm
19
.l3
cn
19
.l3
co
19
.l3
cp
19
.l3
cq
19.1
3cr
19.l
3cs
19
.l3
ct
19
.l3
cu
19
13
cv
19.1
3cw
19
.l3
cx
20
.la
20.l
b
20.2
a
20.2
b
20.2
c
20.3
a
20.3
b
ZN
F41
8Z
inc
fing
erpr
otei
n41
8K
RA
B16
-67
6
ZN
F25
6Z
inc
fing
erpr
otei
n25
6K
RA
B15
-62
7
ZN
F6O
6Z
inc
fing
erpr
otei
n60
6K
RA
B16
-79
2
ZSC
AN
1Z
inc
fing
eran
dSC
AN
dom
ain
cont
aini
ng1
SCA
N3
+40
8
ZN
F13
5Z
incf
inger
pro
tein
135
KR
AB
16+
658
ZN
F44
7Z
inc
fing
erpr
otei
n44
7SC
AN
2-
510
ZN
F32
9Z
inc
fing
erpr
otei
n32
9-
12-
541
ZN
F27
4Z
inc
fing
erpr
otei
n27
4K
SK5
+65
3
ZN
F54
4Z
inc
fing
erpr
otei
n54
4K
RA
B13
+71
5
ZN
F8Z
inc
fing
erpr
otei
n8
KR
AB
7+
575
HK
R2
GL
I-K
rupp
elfa
mil
ym
ember
HK
R2
SCA
N8
÷49
1
ZN
F49
7Z
inc
fing
erpr
otei
n49
7-
14-
498
L0C
1164
12H
ypot
heti
cal
prot
ein
BC
0123
65-
8-
531
ZN
F58
4Z
inc
fing
erpr
otei
n58
4K
RA
B8
+42
1
ZN
FJ3
2Z
inc
fing
erpr
otei
n13
2K
RA
B18
-70
6
ZN
F32
4BZ
inc
fing
erpr
otei
n3248
KR
AB
9+
544
ZN
F32
4Z
inc
fing
erpr
otei
n32
4K
RA
B9
+55
3
ZN
F44
6Z
inc
fing
erpr
otei
n44
6SC
AN
3+
450
ZN
F49
9Z
inc
fing
erpr
otei
n49
9B
TB4
-51
1
ZN
F42
Zin
cfi
nger
prot
ein
42SC
AN
13-
734
ZN
F93
Zin
cfin
ger
pro
tein
93K
RA
B17
SC
RT
2S
ratc
hho
mol
og2,
zinc
fing
erpr
otei
n-
5-
307
ZN
F34
3Z
inc
fing
erpr
otei
n34
3K
RA
B12
-59
9
ZN
F33
9Z
inc
fing
erpr
otei
n33
9-
4-
275
ZN
F13
3Z
inc
fing
erpr
otei
n13
3K
RA
B15
+65
3
ZN
F33
6Z
incf
ing
erp
rote
in33
6B
TB10
+71
1
ZN
F33
7Z
inc
fing
erpr
otei
n33
7K
RA
B20
-75
1
PLA
GL
2Z
inc
fing
erpr
otei
nPL
AG
L2
-6
-49
6
ZN
F341
Zin
cfi
nger
prot
ein
341
-12
+84
7
SCA
ND
1SC
AN
dom
ain
cont
aini
ngpr
otei
n1
SCA
N-
-17
9
ZN
F33
5Z
inc
fing
erpr
otei
n33
5-
13-
1342
ZN
F66
3Z
inc
fing
erpr
otei
n66
3-
1-
106
ZN
F33
4Z
inc
fing
erpr
otei
n33
4K
RA
B14
-68
0
SNA
I1Z
inc
fing
erpr
otei
nsn
ail
hom
olog
-4
+26
4
SAL
L4
SaI-
like
4-
7-
1053
ZF
P64
Zin
cfi
nger
prot
ein
64ho
mol
og-
13-
645
6312
5064
6314
4013
6318
0252
6323
7246
6326
2424
6328
7018
6332
9507
6338
6208
6343
2646
6348
2130
6353
0197
6355
7537
6357
0549
6361
1875
6363
5994
6365
4783
6367
0275
6367
9607
6371
6709
6376
5096
5902
40
2410
463
1795
2796
1821
7157
2329
3021
2560
2851
3024
3968
3178
3469
3400
4960
4401
0699
4447
6274
4456
3114
4803
2934
4983
3988
5013
3957
6313
8552
6315
0889
6320
6526
6325
7811
6327
2588
6330
1389
6335
3960
6341
6739
6346
7285
6349
9066
6354
5510
6356
5932
6358
4224
6362
150
6
6364
3401
6366
1011
6367
6577
6368
4409
6372
2733
6377
6754
6048
23
2437
778
1798
6521
1824
5640
2330
1683
2562
5469
3025
9192
3184
3736
3400
5842
44
03
42
40
44
52
13
22
4457
5601
48
03
88
30
4985
2421
5024
1931
Tad
epal
lyet
.aI
Page 128
202
0q
13
.2
2020
q13.
31
202
0q
13
.33
2121
q21.
1
212
1q
22
.3
212
1q
22
.3
2222
p
2222
q11.
1
2222
q11.
1
2222
q11.
21
2222
q11.
21
2222ql1
.2
2222q11.2
2
2222q11.2
3
222
2q
12
.2
222
2q
11
.2
XX
p21.
3
XX
pll
.3
XX
pll
.2
XX
pll
.23
XX
pll
.23
XX
p22.1
1-p
ll.2
3
XX
pll
.1-1
1.3
XX
pll
.23
XX
pll
.21
XX
pll
.21
XX
pll
.1
XX
q13.
2
XX
q21.
1-q2
1.2
XX
q23
XX
q26.
3
XX
q26.
3
XX
q26.
2
Xq2
8
ZN
F21
7
CT
CFL
BT
BD
4
ZN
F29
9P
PR
DM
15
ZN
F29
5
ZN
F73
4)L
0C34
3927
4)L
0C39
1288
ZN
F74
HIC
2
SU
HW
2
SUH
W1
ZN
F7O
ZN
F27
8
ZN
F69
ZFX
ZN
F67
4
ZN
F15
7
ZN
F41
ZN
F81
ZN
F21
ZN
F63O
4)L
0C13
9163
KL
F8
ZX
DB
ZX
DA
9)L
0C26
0337
ZN
F6
ZB
TB
33
ZN
F75
ZN
F44
9
ZIC
3
ZN
F27
5
Zin
cfi
nger
prot
ein
217
Zin
cfi
nger
prot
ein
CT
CF
-T
BTB
(PO
Z)
dom
ain
cont
aini
ng4
zinc
fing
erpr
otei
n29
9pse
udogen
e
Zin
cfi
nger
prot
ein
298
Zin
cfi
nger
prot
ein
295
(ZB
TB
21)
Zin
cfi
nger
prot
ein
73
sim
ilar
tozi
ncfi
nget
prot
ein
91
Sim
ilar
tozi
ncfi
nger
prot
ein
532
Zin
cfi
nger
prot
ein
74
ZB
TB
3O
Zin
cfi
nger
prot
ein
279
Zin
cfi
nger
prot
ein
280
Zin
cfi
nger
prot
ein
70
Zin
cfi
nger
prot
ein
278
Zin
cfi
nger
prot
ein
69
Zin
cfi
nger
ptot
ein
X-l
inke
d
Zin
cfi
nger
prot
ein
673
Zin
cfi
nger
prot
ein
157
Zin
cfi
nger
prot
ein
41
Zin
cfi
nger
prot
ein
81
Zin
cfi
nger
prot
ein
21
Zin
cfi
nger
prot
ein
630
Sim
ilar
toS
al-l
ike
prot
ein
1
Kru
ppel
-lik
efa
ctor
8
Zin
cfi
nger
,X
-lin
ked,
dupl
icat
edB
Zin
cfi
nger
,X
-Iin
ked,
dupl
icat
edA
Zin
cfi
nger
prot
ein
Np9
7pse
udogen
e
Zin
cfi
nger
prot
ein
6
Zin
cfi
nger
and
BT
Bdo
mai
nco
ntai
ning
33
Zin
cfi
nger
prot
ein
75
Zin
cfi
nger
prot
ein
449
Zin
cfi
nger
prot
ein
ofth
ece
rebe
llum
3
Zin
cfi
nger
ptot
ein
275
-7
-10
48
-11
-66
3
BT
B2
-58
9
8+
326
-+
-
12+
572
5+
615
1-
543
1-
542
11-
446
7-
687
-+
323
13+
805
11-
581
12+
506
18-
779
13+
661
15-
639
13-
657
3+
359
9+
803
9-
799
-+
-
11+
761
3+
672
5-
510
7+
518
4÷
467
11+
376
D
13-
1507
6-
1066
21
.la
21.l
b
22
.la
22.l
b
X.l
a
X.l
b
X.1
c
X.1
d
X.l
e
X.2
a
X.2
b
X.3
a
X.3
b
X.4
a
SE
T
BT
B
KR
AB
BTB
BT
B
KR
AB
KR
AB
KR
AB
KR
AB
KR
AB
KR
AB
KR
AB
BT
B
SCA
N-K
RA
B
SCA
N
D
Tad
epal
lyet
.aI
5161
7017
5550
5630
6184
6322
2338
3998
4209
1454
4228
0009
4500
78
1500
7571
1571
3058
1907
8478
2010
1693
2116
8767
2119
8060
2241
3772
3005
1790
3075
1319
2407
7824
4624
3490
4711
4926
4719
134
7
4758
1245
4771
9194
4780
2547
4931
6398
5627
5632
5763
4994
5795
0922
7324
2135
8438
5694
1192
6863
5
1342
4738
5
1343
0638
7
1364
7601
2
1522
6206
0
5163
3043
5553
3560
6190
7300
2338
5465
4217
2660
4230
3519
4525
98
1505
4750
1576
1696
1909
1970
2013
5748
2119
3505
2120
4613
2242
3279
3007
2249
3076
5974
2414
2549
4628
9820
4715
8338
4722
7289
4766
6550
4774
8321
4781
5739
4932
5222
5632
8255
5764
0635
5795
3792
7324
5219
8441
5024
1192
7627
9
1343
0562
3
1343
2500
4
1364
8192
5
1522
7024
9
Page 129
DD
XX
q28
X.4
bL
0C13
9735
Sim
ilar
tazi
ncfi
nger
ptot
ein
92K
RA
B8
+49
515
2336
765
1523
4028
0
YY
pll
.3-
ZFY
zinc
fing
erpr
otei
n,Y
-Iin
ked
-12
+80
128
6354
629
0989
1
YY
ql1.
223
-Y
ZN
F381
Pzi
ncfi
nger
prot
ein
381,
Y-I
inke
dpse
udogen
e-
-+
-23
6055
7023
6067
03
YY
ql1.
223
Y.l
aY
L0C
3926
03si
mil
arta
Zin
cfi
nger
prot
ein
43(Z
inc
prot
ein
HT
F6)
--
+-
2523
5272
2524
0390
YY
ql1.
23Y
.1b
YL
0C44
2486
sim
ilar
tazi
ncfi
nger
prot
ein
91-
2554
0738
25
54
59
02
For
each
C2H
2-Z
NF
inth
edat
aset
Chr
omos
ome
num
ber
2P
osit
ion
onth
ech
rom
osom
e
The
clu
ster
num
ber
taw
hich
the
gene
belo
ngs
‘-‘f
ora
gene
foun
das
asi
ngle
ton
rath
erth
ana
clus
ter.
Ifth
egen
ebe
long
sto
acl
uste
r,th
ecl
uste
rnu
mbe
ris
indi
cate
d:T
hefi
rstn
umbe
rin
dica
tes
the
chro
mos
ome
num
ber,
The
seco
ndnu
mbe
rin
dica
tes
the
num
ber
0fth
ecl
uste
ron
the
chro
mos
ome.
For
exam
ple,
acl
uste
rnu
mbe
r7.
7in
dica
tes
Chr
omos
ome
1.C
lust
erl
‘
Sta
tus
asa
pseu
doge
neas
rep
od
edin
Gen
bank
.‘Y
’-
lden
tifi
edas
apse
udogen
e
The
nam
ecf
the
C2H
2-Z
NF
and
itsde
scri
ptio
n
The
dom
ain
asso
ciat
edw
ithth
eC
2H2-
ZN
F:
KR
AB
,SC
AN
,SC
AN
-KR
AB
,B
TB,
HO
ME
O,
SE
Tan
with
out
anen
coded
cons
erve
dN
-ter
min
aldo
mai
n()
8T
henum
ber
of
zinc
fin
ger
mot
ifs
pre
sen
t
The
ori
enta
tion
°T
heam
ino
acid
sequen
ceIe
ng
th7172
The
star
tan
dst
op
of
tran
slat
ion
111
Tad
epal
lyet
.aI
Page 130
Supplementary Table S2
Comprehensive summary of the organization of ail C2H2-ZNF tound as singletons or in clusters
on each human chromosome and classified with respect to the various C2H2-ZNF sub-families.
C2H2-ZNF sub-families
Total No. C2H2-ZNF in KRAB SCAN-KRAB SCAN BTB SET HOMEC oChr No. C2H2-ZNP No. Clusters Ciusters S C S C S C S C S C S C S C
1 36 6 17 1 9 1 1 7 1 9 7
2 17 3 7 1 2 9 5
3 30 6 20 2 5 3 2 1 6 11
4 10 1 4 1 2 1 4 2
5 15 1 6 1 5 8 1
6 28 3 16 1 2 2 732 8 3
7 47 7 41 127 2 512
8 30 4 16 36 1 1 2 10 8
9 23 4 10 5 3 1 3 2 4 5
10 22 3 13 8 9 5
11 15 2 4 3 1 1 1 3 4 2
12 15 1 7 2 6 : 6 1
13 5 2 4 1 4
14 10 2 4 1 2 :i 5 1
15 12 3 8 12 1 3 5
16 33 6 27 10 1 2 3 5 12
17 13 2 5 1 1 1 1 11 5 2
18 14 2 6 1 32 5 3
19 289 13 279 3 185 1 13 4 1 6 76
20 18 3 7 2 2 1 2 6 5
21 3 1 2 1 1 1
22 10 1 2 2 2 4 2
X 19 4 11 1 6 1 11 6 3
Y 4 1 2 : 2 2
Total 718 81 578 32 280 4 14 3 30 28 13 1 1 2 3 130 177
No. of C2H2-ZNF without an encoded conserved N-terminal domain (ø), found as singletons (S)
or in C2H2-ZNF clusters (C)
Tadepally et.aI 172
Page 131
Supplementary Table S3
Gene organization of the 81 human C2H2-ZNF clusters.
Cluster1
No. 0f Clusters with Order of the genes from the
Position C2H2-ZNF Solely C2H2-ZNF different C2H2-ZNF subfamilies2 Cluster composition4
1.1
1.2
1.3
1.4
1.5
1.6
2.1
2.2
2.3
3.1
3.2
3.3
3.4
3.5
3.6
4.1
5.1
6.1
6.2
6.3
7.1
7.2
7.3
7.4
7.5
7.6
7.7
8.1
8.2
8.3
8.4
9.1
9.2
9.3
9.4
10.1
10.2
10.3
11,1
11.2
12.1
13.1
13.2
14.1 j14.2
15.1
15.2
15.3
16.1
1p36.l1
1p34.2
1p34.1
1q42.13
1 q44
1 q44
2q 11.1
2q1 3
3p22.1
3p2l .32
3q1 3.2
3q24
3q26.32
4pl 6.3
5q35.3
6p22.1
6p22.1
6p2l.3
7p22.1
7p1 1.2
7q1 1.21
7q22.1
7q22.1
7q36.1
8p23.l
8q24.1 3
8q24.3
8q24.3
9q22.32
9q31 .2
9q32
9q33,2
lOpll.21
lOqi 1.21
lOqll.21
llpl5.4
11q12.2
12q24.33
13q22.1
13q32.3
14q11.2
1 4q23.3
15q24
15q25.3
1 5q26.1
l6pl3.3
2
3
2
2
6
2
2
3
2
3
3
8
2
2
2
4
6
2
12
2
2
4
5
9
8
3
10
2
2
5
7
4
2
2
2
6
3
4
2
2
7
2
2
2
2
2
3
3
9
No
No
No
No
No
Yes
No
Yes
No
Yes
No
Yes
No
YesNo
Yes
No
No
No
No
No
No
No
No
No
YesNo
No
No
No
No
No
No
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes
No
YesYes
YesNo
No
No
No
0/0
KIKIK
0/0
o/K
KJKJK/KIK/S-K
0/0
0/0
K’K/ø
0/0
o/KIK
0/1<1K
S-KIKIS-KJø/S-KIo/ø/ø
ø/B
0/0
0/0
KIø/ø/K
K/K/ø/KJK/K
ø/K
S/S/S-K/ø/S/S-K/o/S/S/S/S/K
B/B
o/K
ø/K/ø/K
o/o/KIø/K
KJK/K/ø/ø/K/K/ø/K
K/S-K/S-KIø/ø/KIK/K
K/K/K
/K/K/K/K/ø/K-K
0/0
H/H
ø/ø/o/ø/K
ø/KIKJKIKIo/K
ø/KJKJø
0/0
o/K
BIB
KIø/ø/KJK/K
o/KIK
K/ø/K/ø
S-K/K
0/0
KJKIKJKJKJKIø
0/0
0/0
H/o
BIB
B/o
S/S/ø
0/0/0
S/KIS-K/ø/S-KIKIø/S/o
Pure
Pure
Pure
Mixed
Mixed
Pure
Pure
Mixed
Pure
Mixed
Mixed
Mixed
Mixed
Pure
Pure
Mixed
Mixed
Mixed
Mxed
Pure
Mixed
Mxed
Mixed
Mixed
Mixed
Pure
Mixed
Pure
Pure
Mixed
Mixed
Mixed
Pure
Mixed
Pure
Mixed
Mixed
Mixed
Mixed
Pure
Mixed
Pure
Pure
Mixed
Pure
Mixed
Mixed
Pure
Mixed
Tadepally et.aI 113
Page 132
16.2 16pI3.3 2 No øIS Mixed
16.3 16p11.2 10 No øIø/5K/ø/ø/ø Mixed
16.4 l6pll.2 2 No øIK Mixed
16.5 16q22 2 Yes KIK Pure
166 16q24 2 2 Yes oie Pure
17.1 17p13.2 3 No ø/S/ø Mixed
17.2 l7pll.2 2 No S-K/K Mixed
18.1 18q12 4 Yes S/o/S/S Mixed
18.2 18q23 2 No øiø Pure
19.1 19q13.2 3 Yes K/KJK Pure
19.1 l9p13.3 2 No ø/B Mixed
19.11 19q13.31 24 Yes 19KI5ø Mxed
19.12 19q13.41 43 No ‘ 30KI13ø Mixed
19.13 19q13.43 76 No 343K/1S-K/12S/1B/19ø Mixed
19.2 19p13.3 5 Yes K/K/K/K/K Pute
19,3 19p13.3 2 No ø/B Mixed
19.4 19p13.2 2 No Kb Mixed
19.5 lgpl3.2 14 No 9K/5ø Mixed
19.6 19pl3.2 28 No 21K/7ø Mixed
19.7 19pl3.ll 40 No 328K/12ø Mixed
19.8 19q13.11 8 No 3K/S/3ø Mixed
19.9 19q13.12 32 No 23K/1B/8ø Mixed
20.1 20p11.23 2 No K/ø Mixed
20.2 20q13.12 3 No obøbK Mixed
20.3 20q13.2 2 Yes øÏø Pure
21.1 21q22.3 2 No Se/B Mixed
22.1 22q11.22 2 No 0/0 Pure
X.1 Xpl 1.23 5 No K/K/K/K/K Pure
X.2 Xpll.1 2 No 0/0 Pure
X.3 Xq26.3 2 Yes S-K/S Mixed
X.4 Xq28 2 No oiK Mixed
Y.1 Yql 1.23 2 No 0/0 Pure
Total 81 518 Yes= 25; No = 56 Pure= 29; Mixed= 52
Clusters C2H2-ZNF
For the cluster name, the first number correspond to the chromosome on which the cluster is found
and the second to the number attributed to the cluster.
2 Sequential order 0f the genes from the different C2H2-ZNF subfamilles such as KRAB (K), SCAN (S),
SCAN-KRAB (S-K), SET (Se), HOMEO (H) and without an encoded conserved N-terminal domain (o)
For the very large clusters, the number of C2H2-ZNF from each subfamilles is specified
(eg: 23 K means that 23 consecutive genes from the KRAB-C2H2-ZNF subfamily are found in the cluster).
‘ Pure’ = The cluster is composed 0f C2H2-ZNF from a single subfamily;
‘Mixed’ = different subfamilles 0f C2H2-ZNF are present in the cluster.
Note: ‘Pure’ clusters with solely tandemly repeated C2H2-ZNF are in grey
Tadepally et.aI 114
Page 133
Com
pre
hen
sive
cata
log
ofth
eC
2H2-
ZN
Fg
enes
from
the
81h
um
ancl
ust
ers
and
thei
rsy
nte
nic
counte
rpar
tsfr
omoth
erm
amm
alia
n
gen
om
es(C
him
pan
zee
Mou
se,
Rat
and
Dog
)
Hum
anC
iust
er7.
2
hch
rl
F0
ZM
PST
E24
CO
LSA
2
L0
C3
88
62
lSM
AP
IL
ZN
F64
3Z
NF
642
ZN
F68
4
F0
L0C
7244
33.
RIM
S3
NFY
CF
0004
DF
0
K9
+
K9+
K8
+
Ch
imp
anze
e
pC
hrl
F0ZM
PSTE
24,
L0C
3567
9lD
FO
L0C
7466
91,
SM
AP
IL
ZN
F64
3Z
NF
642
L0C
4567
97
F0
t0C
7687
4&R
IMS3
t0C
4557
93K
CN
Q4
-9
*
K9.
KB
*
Mouse
mC
hr4
FGZ
mp4
e24,
CoI
9a2
DF
C
Sm
apll
Zfp
69K
9
F0
Ri,
,e3,
Nfy
c
Kcn
a4
Rat
rchr5
F0Z
mps
O24
Co!
942
DF
O
Sm
4plI
F0
R0
3.N
fyc
Kcr
,4
Oog
F0tO
C6O
74S6
tOC
-175
312
tOC
’524
51
JD
Su
pp
lem
enta
ryT
able
4
D
Hum
anC
iust
er7.
7C
him
pan
zee
Mou
seR
atD
og
hch
r7pohrl
mch
r4rc
hr5
cchr2
DF
OD
FO
DF
0D
F0
OF
O
F0
TRIM
G3,
FOlK
ILF
0TR
IM63
,P
0IK
ILF
0Tr
i,r,6
3,P
d,kI
lF
0T
,ss,
63,
Pd,
kIl
F0
TR
IM63
,PD
IKIL
0R
AP
I0
RA
PI
Gra
piG
rapl
0RA
PI
ZN
F59
3-
1*
ZN
F59
3-
1Z
fp59
3.
t-
Ztp
593pre
d.
1.
L0C
4873
60-
1*
ZN
68
3-
4Z
NF
683
-4
L0C
4873
54-
4-
F0
LlN
28.
014003
F0
LIN
2B01
4003
F0-
Lif
l28.
Dhd
dsF
0L
,a28
.D
hdds
FGL
lN28
.D
HO
O3
HM
ON
2H
MG
N2
H,r
ç,,2
Hrg
r2H
MG
N2
rchr7
G.
t0C
6O76
22,
DF
O
OC
GO
Z00
9CO
C3S
_’45
8
L.0
C48
2457
K17
-
L0C
4824
54K
S-
Hum
anC
lust
er7.
3C
him
pan
zee
Mo
use
Rat
Dog
hchrl
pch
rfm
chr4
mch
r5C
chrl
5
F0
AT
PSV
OS.
BA
0AL
TD
FO
F0
AT
FSV
OB
,B
A0A
LT
DF
OF
0A
1p
6o
bB
agal
0D
FO
FGA
tp6r
obB
agal
t2D
FO
F0
AFP
,500
B.
BAG
ALT
DF
O
2,C
C02
4,S
LC
6A9
2.C
C02
4SC
C6A
90
16
9$1c5
92
CC
D24
SLC
F4S
KL
F17
.3
*K
LF1
7-
3*
K1f
17.
3-
Z1p
393_
pred
.3
-K
LF1
7-
3*
L0
C1
28
20
8-
5-
F0
OM
AP
I,P
RN
PIP
F0
OM
API
,P
RN
PIP
F0
Dnep
l,P
rspi
pF
00,m
pl,
Frr
pip
F0
OM
API
,P
NF
lP
TM
EM
53T
ME
M53
Tmem
S3Tm
5705
3T
ME
M53
115
Tad
epal
lyet
Page 134
Hum
anC
lust
er1.
4C
him
pan
zee
Mo
use
Rat
Dog
hchrl
pch
rlm
ch
rll
rchrl
Occhrl
4
F0
CD
C42
BP
AD
FO
F0
CD
C42
BP
AD
FO
F0
Cdc4
2bpi
DF
OF
0C
dc42
bpa
DF
OFG
CD
C42
SP
AD
FO
CT
O-2
90?
CT
D-2
90?
Cld
-29d
,C
td-2
9d1
0W
-290?
ZN
F67
815
*L
0C46
9695
-15
*
gm
127
K3
FGJM
J04
,M
PN2
F0
]MJD
4,M
PN2
F0
Jmjd
4K
OJm
jd4
F0
JMJD
4,M
PN2
WN
T9A
WN
T9A
Wnt
9aW
*t9a
WN
T9A
Hum
anC
iust
er1.
5
hchrl
DF
0KG
TFS
2MSC
CPD
H
L0C
149134A
HC
TF
I
ZN
F69
5K
-
ZN
F67
OK
9
ZN
F66
9K
9
ZN
F12
4K
7
L0C
729806
K10
ZN
F49
65K
5
F0
C?A
SI0
R2811
002W
5
Ch
imp
anze
e
pchrl
KGT
FOZ
MSC
CPD
H
tOC
l49l
34,A
HC
TF
I
ZN
F67O
K9
-
L0C
4578
77K
9-
ZN
F12
4K
7-
L0C
4578
80K
10-
ZN
F49
6s
s-
F0
C?A
SI,
OR
2BII
0R
2W
5
DF
O
Mouse
mch
r8
DF
CF0
.TG
2m,
S**Ç
4)
G,,,
l305
,A
ScII
I
BC
0500
78K
12-
Ztp
496
SK5
-
F0
C,a
sl.
0/5-
222
C0C
6II9
157
Iat
rch
rl0
FG
flb2
,n,
Scc*
**
On,
1305
,A
ScII
?
Zfp
496
F0
C,G
sl,
09,2
22
C0C
6681
57
DF
0
55-
000
cch
rll
DF
OG
C0C
-12O
I2
0C
48
010
5
L0C
4905
75s
5-
WC
?AS
I,0R
2011
7R2W
5
Hu
man
Clu
ster
1.6
hchrl
F0
0R
2T
22
7,
00
08
UI
DF
O
SH
3BP
SL
ZN
F67
2-
13
ZN
F69
2-
5
F0
P0
80
2
Ch
imp
anze
e
pcO
n
F0
0827227.
OR
5BU
ID
FO
SH
3BF
5L
ZN
F67
2-
13*
ZN
F69
2-
5-
FO
P0B
D2
Mo
use
mcnn?
F0
0r2
t227
DF
C
S53
bp5?
Zfp
672
-13
Zfp
692
-5
F0
Pgb
d2
Rat
tch
tlO
F0
0,21
227
DF
O
Sh3
bp5?
Zfp
672
-13
-
Zfp
692
-s
*
KG
P9b
d2
Oog
DC
fltl
t$
‘G.
CR
2722
70R
59U
1D
FO
OH
3BP5
L
L0C
4826
99-
12*
L0C
4826
98-
3-
F0
P12
802
Hu
man
Clu
ster
2.7
Chim
panzee
Mouse
Rat
00g
nchr2
pch
r2m
chr5
rch
r5cchnl7
F0
MP
VI7
.G
TF
3C2
DF
OF
0M
PV
I7.
0TF
3C2
DF
OF
GG
1f3c
2,E1
f294
DF
OF
0G
03c2
,E
it2b4
DF
OFG
MP
Vl7
0TF
3C2
DF
O
E(F
284.
SN
X17
E1F
284.
SN
XI7
Sra
xI7
5cc
17
E1F
284,
SN
XI7
ZN
F51
3-
7-
ZN
F51
3-
7-
Zfp
513
-7
-Z
tp51
3-
7-
L0C
4830
12-
7-
ZN
F51
2-
2*
ZN
F51
2-
2=
Zfp
512
-2
R0D
1561
52-
2-
L0C
6082
961
*
F0
CC
OC
I2I.
XA
BI
F0
CC
OC
I2I,
XA
BI
F0
,Xab
lSu
pS?
F0
Xab
lSupII
IF
0C
CD
CI2
IXA
B1
sup
Trt
SU
PT
7LS
UP
ITL
D
116
Tad
epal
lyet
Page 135
DD
D
Hum
anC
lust
er2.
2C
him
pan
zee
Mouse
Rat
Dog
hch
r2p
chr2
mch
r2r
chr3
Cch
rl7
F0
.TE
KT
.,L
0C44
2048
DF
OF
0T
EF
l.4M
AC
DF
OF0
.T
kt4,
MA
IO
FO
F0.
TAkI
4,M
AI
DF
OF0
,C
0C48
3056
DF
O
MA
C,
MR
PS
5M
RP
S5
Mrp
s5M
rps5
MA
L,L
OC
475
746
ZN
FS
14K
7-
L0C
4704
31K
7-
Zfp
661
Kg
-Z
fp66
1K
9-
L0C
6116
70K
7*
ZN
F2
K9
+Z
NF
2K
9L
0C48
3055
K26
-
L0C
344065
10÷
L0C
4594
006
*
F0
PR
QM
2,K
CN
IP3
F0.
FR
OM
2.K
CN
IP3
F0
P,o
m2K
*nip
3F
0P
ru,n
2.K
cn3
F0.
tOC
S’76
54
FAI4
D2A
F45054
FaM
2a
Fah
d2
atQ
C50
5135
,LO
CA
2S74
5
Hum
anC
lust
er2.
3C
him
pan
zee
Mouse
Rat
Dog
hchri
pchri
F0
AD
RA
.2B
,A
STL
DF
OF
0A
DR
A2B
,A
STL
DF
OF
0A
DR
A2a
AST
LD
FO
F0
AD
RA
2B,
AST
LD
FO
F0
AD
RA
2B,
AST
LD
FO
OU
SP
SO
US
P2
DU
SP
2D
US
P2
DU
SP
2
L0C
343938
.-
-
L0C
442041
--
F0
NC
AP
HF
0N
CA
PH
F0
NC
AP
HF0
.’N
CA
PHFG
’NC
APH
LIN
CR
L1N
CR
LIN
CR
LIN
CR
LIN
CR
Hum
anC
iust
er3.
1C
him
pan
zee
Mo
use
Rat
Dog
hch
r3p
chr3
mch
r9r
chr8
cch
r23
F0
MF
RIP
,EIF
IBD
FO
F0.’
EIF
IBE
NT
PD
3D
FO
FG.M
FRIF
EIF
IBD
FO
FG
.MF
RIF
EIF
I6D
FO
F0-
MY
RIP
,EIF
IBD
FO
EN
TP
O3
RP
LI4
L0C
7357
59,L
0C73
5578
E*t
pd3,
RpI
$4E
Mpd
3,R
p114
EN
TPD
3,R
PC
I4
ZN
F61
9.
w+
L0C
4707
972K
26
ZN
F62O
K8
+L
0C47
0799
.-
-
ZN
F62
1K
7+
F0
tOC
S-35
607
CO
CS5
7625
F0
MP
PS
3IP
I,F
0C
0C64
5857
C0C
5576
25F0
.L
0C64
5507
,LO
CF5
625
F0
t0C
545807.
MR
PS
SIP
I.C
TN
N3I
CrN
N3I
MR
PS
3IP
I.C
TN
N3I
MR
PS37
P’,
CT
’’N
3IF7
RPS
3IPI
.C
TN
S3I
Hum
anC
lust
er3.
2C
him
pan
zee
Mo
use
Rat
Dog
hch
r3p
chr3
mch
r9r
chr8
Cch
r23
F0
VIS
AI.
SE
C22C
DF
OF
G.v
IPR
I,S
EC
22C
DF
OF
056
*22*
.Ss
I8I2
DF
O50
.0s
c22*
,55
1612
DF
OF
0V
IPA
1,S
EC
22C
DF
O
5518
L2.
NK
TR
5518
12,
NK
TR
Nkt
rN
ktr
5518
12,
NK
TR
ZN
F65
1.
8+
ZN
F66
2R
8+
Zfp
651
B8
*R
GD
156
2434
B8
*
ZN
F66
2K
8+
L0
C3
39
90
3K
-+
F0
SNA
K.
TM
EM
I6K
FG
.SIJ
RK
,FM
EM
IGK
F0
S,,r
k,T
,ss,
nl6
kF
0S
s,k.
Tm
ersl
6kF
057
*55,
TM
EM
I6K
46
50
54
55
05
4656
55A
bf,d
5A
6HD
S
11
7T
adep
alty
et.
Page 136
DD
Hum
anC
iust
er3.
3C
him
pan
zee
Mouse
Rat
Dog
hch
r3p
chr3
mch
r9r
chr8
Cch
r23
F0
SNR
K,
TM
EM
I6K
DF
OF
001
4FF,
TM
EM
I6K
OF
OF0
.0,
1,5,
Tm
eml6
KD
FQ
F0
5,,,k
,T
mm
f6I(
DF
OF
0,
014F
F,T
ME
MIG
KD
FO
AB
HD
5,F
LJ3
615
7A
BH
D5,
FC
J361
57A
bhd5
,L
0C66
62la
Abh
d5,
L0C
6662t8
AB
HD
5
ZN
F44
50K
14-
ZN
F44
5e,
4-
Ztp
445
0K12
-R
GD
1559
144
0K12
-Z
NF
445
SK14
L0
C2
85
34
6K
12-
ZN
F16
723
*Z
tp16
7s
ii•
Znt
167
022
*
ZN
F16
70K
13+
ZN
F35
-11
*Z
fp66
0-
10*
Ztp
lO5
-11
ZN
F66O
-10
+Z
NF5
O2
.14
+Z
fplO
5-
11
ZN
F19
7SK
22+
L0C
4708
07-
9*
ZN
F35
-11
+
ZN
F5O
2-
14+
ZN
F5O
J9
+
F0
KIA
AI
143,
KIF
ISFD
KIM
I14
3.K
IFI5
KG
,;10
0590
10R
1K.
F111
5FG
11I0
059G
IOR
iK,
K,5
15F
0?1
3EM
42.
KIK
IS
TM
EM
42,T
GM
4T
ME
M42
.TG
M4
Tm
e,e3
2,T
g,4
7rr,
e,r4
2.T
gr4
Hum
anC
lust
er3.
4C
him
pan
zee
hch
r3
F0
QT
RT
DI
DF
O
OR
D3
ZN
F8O
ZB
TB
2O
Mo
use
pch
r3
KG
QT
RT
DI
DR
D3
L0C
4708
867
ZB
TB
2OB
5
-7
55
OF
O
F0
04P
43
LSA
MP
Rat
mchrl
6
F0
05141
DF
Drd
3
Zbt
b2O
B5
C
Doq
F0
GA
P43
LSA
0IP
rchrl
l
FQ
Q5
14
;O
FO
9,14
3
Zbt
b2O
B5
FGG
ap43
Lsa
ne
CC
0t7
7
F0
OT
RT
DI
DF
O
DR
D3
GG
ap43
LS
a,rv
FG
GA
P43
LSA
MP
Hum
anC
lust
er3.
5C
him
pan
zee
Mouse
Rat
Dog
hch
r3p
chr3
mch
r9r
chr8
Cch
r23
F0
PC
SC
R2
PL
SC
RI
DF
OF
0P
LS
CR
2.P
L$C
R1
F0
PIs
*r2
F0
Pls
cr2
KG
PL
SC
RZ
PL
SC
RI
PC
SC
R5
PL
SC
R5
PIsc
rSP
Isc,
5P
LS
CR
S
ZIC
44
.L
0C47
0956
-4
-Z
ic4
-4
Z,c
4-
4L
0C
485704
-4
ZIC
1-
4-L
QC
46Q
759
-4
,Zic
l-
4-Z
ici
.4
-L0C
611554
-4
*
FG
RP
L3
SP
I.4
0T
01
FG
RP
L3
8F
I,A
GT
RI
F0
Rp1
38p1
KG
Rp1
38p1
FG
RP
C3S
PI,
AG
TR
I
CP
BI
OP
SI
Cp
bl
Cpbl
CP
BI
118
Tad
epal
lyet
.
Page 137
DD
Hum
anC
iust
er3.
6C
him
pan
zee
Mo
use
Rat
00
9
hch
r3p
ch
r3m
ch
r3rc
hr2
cchr3
4
F0
ZM
AT
3,PI
K3C
AD
FO
FGZ
MA
T3,
PIK
3CA
DF
OF
0Z
,rat
3,P
,k3c
sD
FO
FGZ
mal
3,P
,k3c
aD
FO
F0
ZM
AT
3.PI
K3C
AD
FO
KC
NM
B3
KC
NM
B3
KC
NM
B3
WIG
1-
2-
WtG
1-
2-
Wig
l-
2-
WIg
l-
2-
WIG
J-
2-
ZN
F63
9-
5*
ZN
F63
9-
•Z
tp63
9-
5Z
fp63
9-
5•
ZN
F63
9-
s
FO
MF
NI.
FG
MF
NI,
FG
MS
UF
GM
fr,l
FG
MF
NI.
0NB
4G
NB
4O
Fb4
0nb4
0512
4
Hum
anC
lust
er4.1
Chim
pan
zee
hch
r4
F0
L0
C7
372
530
FO
ZN
F59
5M
GC
2635
6L
0C
654254
ZN
F14
1F
0P
100
L0
C4
61
04
1.
AT
P5I
Mo
use
mC
0r5
K18
+
-6
+
Kil
+
pch
r4
F0L
0C
73
7253
DF
O
ZN
F59
5Z
NF
718
L0C
461
038
ZN
F721
FGP
10
0
L0C
4610
41,
AT
P5I
Rat
F0
L0C
7372
53D
FC
K16
—
K11
—
K28
rcn
n
F0
L0C
7372
53D
FO
00
gC
Cfl
r3
F0
Pig
g
A1p
5l
F0
L0C
7372
53D
FO
F0
Pigg
4155
/
F0
P10
0
L0C
46l0
41.
AT
F5I
Hum
anC
lust
er5.
7C
him
pan
zee
Mo
use
Rat
00
g
hch
r5pchr5
mch
rll
rchrf
Occhrl
l
F0.
CO
L23
AI,
MR
PL
5OP
3D
FD
50.
CO
L25
AI,
05
4D
FD
FG.C
o123
a1D
FO
F0,C
0123
41D
FO
F210007556
DF
O
CL
K4
Clk
4C
lk4
L0C
4746
45
ZN
F35
4AK
13-
ZN
F35
4AK
13-
Zfp
354a
K13
Ztp
354a
K13
*Z
N5
94
K35
-
ZN
F35
4BK
13*
ZN
F35
4BK
13•
Ztp
354b
K13
RG
D15
6008
05K
13-
L0C
4746
50K
24-
ZF
P2
-13,Z
NF
71
K-
*Z
fp2
-9
-RG
D156327
K-
ZN
F45
4K
125Z
FP
2K
il•Z
fp45
4K
12-Z
fp354c
Kil
-
DK
FZ
p686
E24
33K
13*
ZN
F45
4-
12,
9630
041
NO
7Rik
K13
-
ZN
F35
4CK
uZ
fp35
4cK
il-
F0
AO
AM
TS2
F0
0RM
6.A
DA
MT
S2F
0A
da,,5
s2P
0A
dam
ts2
F0
LO
C48
1453
L0C
391859
RuF
ylR
uFjl
L0C
4824
54
119
Tad
epal
lyet
Page 138
DD
hch
r6p
chr6
DF
0FG
F0
M1
2l1
67
.F
66
08
3
RP
I-I5
30
14
3
L0C
3461
57Z
NF
184
mch
rl3
-9
*
K19
-
60P0
M12
1167
,F
6508
3
RP
I-l5
30l4
3
L0C
4722
31Z
NF
184
OF
O
F0
HIS
T(H
2AI
HIS
T1H
3I4M
IST
IH2k
I
K19
-
rch
rlO
PCP
om
12l6
2D
FC
Fks
583
Zfp
184
K19
*
Hum
anC
iust
er6.
7C
him
oan
zee
Mouse
Rat
Doa
F0
HIS
TIH
2AI
HIS
TIH
3HH
IST
IH2A
I
Hum
anC
lust
er6.
2
Cch
r35
F0
Pc,
m12
I62
DF
O
Ffs
g83
Z1p
184
K19
FGH
IST
IH2A
1
HIS
TIH
3HH
IST
IH2A
J
Ch
imo
anze
e
60.
P0M
121I
67.
FKSG
83
RP
I-l5
30l4
3
L0C
478746
DF
O
.9*
F0
66SF
!H2A
I
8IS
TIH
3H
HIS
TIH
2M
Mo
use
mchtl
3
F0
H!S
TIH
2A1
H!S
TIH
3HH
!ST
IH2A
J
Rat
rchrl
7p
chr6
F0
C0C
47l9
1O
L0C
4625
18
ZN
F16
5L
0C47
1912
L0C
4719
14Z
NF
193
ZN
F3O
7Z
NF3
O6
ZN
F96
ZN
1390
ZN
F4S
2Z
NF3
11
OF
hch
r6
FO
OR
2B
7P
0R
288P
DF
O
OR
IFI2
P
ZN
F16
5Z
NF
435
ZN
FJ9
2L
0C22
2701
ZN
F19
3Z
NF3
O7
ZN
F18
7Z
NF
323
ZN
F3O
6Z
N13
05Z
NF
452
ZN
F311
F0O
R2W
IOR
2F
!P
L0C
6462
60
Doq
se
S4
SK9
S5
5K7
-8
S6
57
sii
s-
K14
Cch
r35
F00161370
018-
1369
0(8-
4201
8-13
68
Zfp
96S
8
Zfp
306
s7
Ztp
187
-7
RP
23-2
98F
22.2
57
Zfp
lO2
5K9
DF
O
55
*
-7
.
57
,
518
S5
-
K14
-
oF
001613700181369
DF
0Ifr
4201
frI3
68
Zfp
192
Znt
307
Znf
187
Ztp
307
Ztp
96
DF
OF
0-L
00
61
1561
.
L0C
4883
12C
o*2y
2
C0C
6115
61L
0C48
831
6L
0C48
831
8
SKIS
*
5K3
-
5K7
*
S7*
-8.
515
*
F0
L0C
4719
26
LO
C3
7192
7
F0
0!fr
1366
0!fr
1365
0Ifr
1364
G0
!6l3
66
0161
365
3(8-
1364
F0
L0C
488327
L0C
4883
28
Hum
anC
lust
er6.
3C
him
pan
zee
Mouse
Rat
Dog
hch
r6p
chr6
mchrl
7r
chr2
OC
chrl
2
F0
W0R
46,
PF
DN
6D
FO
F0
W0R
46,
PF
DN
6D
FO
F0
Wdr
46,
Ff8
.6D
FO
F0
Wd,
46,
Ff5
.6D
FO
F0
W0R
46,
PF
ON
6D
FO
R0L
2T
AP
BP
180L
2.T
APB
PR
gf8T
spbp
RgI
2,T
apbp
RG
L2,
TA
PBP
ZN
F29
7B
2-
ZN
F29
7B
2-
Zbt
b22
B3
*Z
b1b2
2B
3*
L0C
6079
00B
2-
ZB
TB
9B
1.Z
BT
B9
B1
•Zb
tb9
B9
-Zb
tb9
B9
-L
0C60
7940
B1
*
F0
0817
1,ff
PR
3FG
BA
KI.
FF
53F
008k;.
15x3
F0
888-
I.15
x3F
GB
AK
I,IT
PR3
120
Tad
epal
tyet
Page 139
hch
r7
D
F0
RA
dD
AO
LS
K0E
LK
2,0A
1021P
DK
FZ
p434
J101
5D
KF
Zp547k054
L0C
442283
ZN
F32
5
F0
0R7E
33.
0576136F
0R7E
59P
pch
r7
F0
RA
CI,
DA
0LB
K0E
LK
2,0
51
02
1F
L0C
7422
63L
0C47
2281
ZN
F32
S
F0
0R7E
39,
0576
136F
0576
59F
mch
r5
F0.
Dlb
Ori
d2;p
Gm
792
Ztp
316
Ztp
l2
F0
0,7
e39
,0
r7e1
36
p
0r7e
59p
Dj
Hu
man
Clu
ster
7.7
Ch
imp
anze
eM
ouse
Rat
Dog
hch
r7p
chr7
mch
r5r
chrl
2C
chr2
3
F0
5CC
29A
4,D
FO
F0
SLC
29A
4,D
FO
F0
SIc
29a4
DF
OFG
Slc
29a4
DF
OF
0SL
C29
A4,
DF
O
KIA
A16
56K
1AA
1856
KIA
A78
56K
1M18
56K
1MI8
56
L0
C4
41
19
3-
-L
0C44
1193
ZN
F81
5K
3*
ZN
F8
5K
3*
F0
PM
S2.
JWI
F0
PMS2
.JO
lIF
0P
se2.
J6I
F0
Pm
s2,
lOI
F0
P010
2.JO
lI
EIF
2AK
IE
IF2A
KI
E7F
2kI
0136
51E
1F2A
KI
Hum
anC
lust
er7.
2C
him
pan
zee
DF
0
Mou
se
DF
0
-5
K15
K15
+ +
Rat
rch
rf2
DF
C
K15
K15
K15
Doq
DF
0
K15
-
K15
*
CC
ht6
F0.
RA
CID
AG
LB
KD
EL
K2.
05
10
21
F
F0
Ra*
l.D
36’b
Ord
2p
flG
0156
3095
-
Zfp
316
-
flG
D15
6439
6K
DF
O
15 15
F0
0r7e
39,
0r7e
136p
0r7e
59p
F0
05
7E39
,05
7E13
6P
0576
59P
Hu
man
cllu
ster
1.3
urn
mpan
zee
Mouse
Hal
uog
hch
r7p
chr7
F0.
CO
L23
AI
MR
PL
5QP
3D
FO
F0
CO
L23
AI.
MR
PLSJ
P3D
FO
CL
K4
CO
Ol
L0C
442311
-12
+Z
NF
479
K10
-
L0C
222032
--
-Z
NF
716
K12
+
ZN
F47
9K
10-
L0C
340223
--
-
ZN
F71
6K
12+
F0
HIS
TIH
2AI
F0
HIS
TIH
2AI
HIS
TIH
3H,H
IST
IH2M
HIS
TIH
3H.H
IST
IH2A
J
121
Tad
epal
lyet
Page 140
t,
‘tQot(t
F-
C”C”
Page 141
DD
D
Hu
man
Ciu
ster
8.1
Chim
pan
ree
Mo
use
Rat
Dog
hch
rlp
ch
rlm
ch
rll
rchrl
Occhrl
6
F0
DE
FB
I3O
DF
0FG
DE
FB
I3O
DF
0F
0E
0654465
DF
OF
00e
fo41
DF
0F
0D
EF
BI3
OD
F0
L0C
389631
--
L0C
441
341
--
-
F0R
PL
38P
I.A
GT
RI
F0
RP
L3
BP
I.A
GT
RI
FGA
gt,I
FGA
gIO
FO
RP
L3
8P
I,A
GT
RI
CP
BI
CF
BI
Cpb
IC
pbI
CF
BI
Hu
man
Clu
ster
8.2
Chim
pan
zee
pcf
lr8
hch
r8
F0
SN
TB
I.H
AS2
HA
SNT
.M
OP
S36
P3
ZH
X2
ZH
X1
DF
0
Mouse
mchrl
5
HI,
HI-
F0
SN
TB
I.H
A$2
HA
SNT
,M
RP
S36
P3
ZH
X2
ZHX
1
DF
O
F0
AT
AD
2
Rat
HI
Hi
F0
S,t
bI,
H3s
2
M,g
s36p
3
Zhx
2H
I+
Zhx
lH
I-
F0
AT
AD
2
Doq
rch
r7
F0S
,,tb
l.H
a12
Mps3
6p3
Zhx
2Z
hxl
F0
.40
42
CC
ht7
3
F0
SN
TB
I,H
AS2
HA
$NT
MR
PS
36P
3
L0C
4820
33H
I
L0C
4750
89H
1
HI
H1
OF
O
F0
A0d2
F0
AT
AD
2
Hu
man
Clu
ster
8.3
Ch
imp
anze
eM
ouse
Rat
00g
hch
r8p
ch
r8m
ch
rl5
rch
r7cchrl
3
F0
L6E
,H
HC
MD
FO
FG,C
Y6E
,H
HC
MD
FO
F0.
Ly6,
HF+
mD
FO
F0.
ty6F
HI+
mD
FO
F0
ty6F
DF
O
LY
6HL
Y6H
Ly6
hL
y6h
Lyg
h
ZF
P41
-4
*G
LI4
-7
Zfp
4l-
4*
Zfp
4l-
4*
GL
I4-
7*
ZN
F62
3-
13*
Ztp
623
-13
*L
0C55
0893
-7
*
ZN
F69
6-
9Z
NP7
O7
K•
Zfp
707
K7
*
ZN
FB
23-
13
ZN
F7O
7R
7-
F0.
BR
EA
2,M
4P
K5
FGB
RE
A2.
MA
PK
I5F
0.&
e42.
Mp
k1
5F
0B
rea2
KO
pkI5
F0
Br*
a2.
MpkI5
F%M
S3H
FAM
43H
Fa*1
834
Fam
83h
Fa,
r.83
h
123
Tad
epal
lyet
Page 142
DD
Hum
anC
k,s
t.r8
.4C
him
pan
zee
Mouse
Rat
Dog
hch
r8p
ch
r8m
chrl
5rc
hr7
cchrl
3
FO-R
EC
OL
4,L
RR
CI4
.D
FO
F0
REC
OL4
,CR
RC
I4,
QF
OF
0R
KFO
(4,L
1121
4D
FO
F0
ReF
qI4
,U,F
l3D
FO
F9w
cu
,r0cs2
,.s
DF
O
CR
RC
24IC
IM16
88L
AR
C24
,KIM
1688
L,r
c24
-,,c
24.0
5132
193
ZN
F251
-7
-L
0C74
2563
K14
Ztp
251
K12
-Z
nt25
1K
12-
L0C
4751
29K
15-
ZN
F34
K12
-Z
NF
3412
-Z
tp7
K16
*Z
fp64
7K
13-
L0C
4821
06K
15-
ZN
F51
7K
10*
ZN
F7K
14*
Ztp
647
K13
L0C
4821
07K
12
ZN
F7
K14
*Z
NF
250
K13
-Z
NF
347
K20
ZN
F64
7K
13Z
NF
16-
ZN
F16
-17
-L
0C74
7863
L0C
642914
K10
-
F0
TM
ED
IOP
.C80
,177
FGL
œ4
&4
47
8F
0-11
1003
8F14
R1k
F0
1110
038F
14R
1kFG
LO
CIO
OIO
LO
C.1
0211
0
C80
r133
LO
C46
4479
Trr
*d1O
PT
rrd
1Q
PL
0C48
2111
Hum
anC
lust
er9.
1C
him
pan
zee
pch
r9h
chr9
*011
5017
03,
DF
C
9LC
35D
2
ZN
F36
7Z
NF5
1ON
F7
82
ZN
F32
2B
Mou
se
DF
O
-2-
K10
-
K14
—11
F0H
SD
1783
,
SL
C35
02
ZN
F36
7-
2
ZN
F51O
K16
L0C
7425
64K
13-
ZN
F32
2B-
11
Rat
tch
tl7
Inchrl
3
*0.1
1*91
763
DF
O
Sf93
542
fp367
-2
G61
4115
20T
0007
59001.
NC
BP
I
Doq
596
1M
525
TDR
D7
TMO
O1.
NC
BFI
F0.
Hs9
l703
DF
C
Sf53
542
Ztp
367
Cco
n
5011
5017
03,
SL
C35
02
L0C
476295
-2
23
-
DF
O
3-0
Tdr
d7
Imo
di,
Ncb
pl
F0
Td,
07
TO
OdI
.N
cbpt
F0
Tdr
d7
T159
d1,N
bpi
Hum
anU
lust
et0.2
L.n
impa
nzee
Mo
use
uo
g
hch
rgp
chr9
mch
r4rc
hr5
Cchrl
5
F0
FCM
D,
TA
C2
DF
OF
0FC
MD
,T
AC
2D
FO
F0-
Fcr
,j,
Tac
2D
FO
F0
Fc,1
11,
Tac
2D
FO
FGFC
MD
,T
AC
2D
FO
TM
EM
350
TM
EM
380
Trr
,n3
8b
Trn
sos3
8bT
ME
M38
B
ZN
F46
2-
o*
ZN
F46
2-
o•
Ztp
462
-o
*Z
fp46
2-
o*
L0C
4816
55-
o
KL
F4-
3-
L0C
4646
40-
3-
K1t
4-
3-
K11
4-
3-
L0C
4816
57-
2-
F0
AC
TL
7B,
AC
TL
7AF
0A
CT
L7B
,A
CT
L7A
F0
Act
ieF
0A
ctf l
bF
0.A
CT
L7B
,A
OT
L7A
IKB
KA
PK
OK
AP
fkbk
apfk
bk,a
pIK
BK
AP
124
Tad
epal
fyet
.
Page 143
DD
Hu
man
Clu
ster
9.3
Ch
imp
anze
eM
ouse
Rat
Dog
hch
rgp
ch
r9m
chr4
rchr5
cchrl
l
F0
SNX
3Q,
TS
CO
TD
FO
F0
SNX
3OT
SC
OT
DF
OF
0S
,,430
Tsc
otD
FO
F0
54x3
0.T
scot
DF
OF
0SN
X3O
,T
SC
OT
DF
O
L0
C169834
-13
-
ZF
P3
74
12
-ZF
P3
7K
12
-Zfp
37
K12-Z
1p37
K12-
F0
SL
C3
lAS
.F
KB
PI5
F0
5LC
31A
2F
KS
PI5
FGS
4ç31
x2F
kbpl
5F
0.S
lc3la
2F
kbpl5
F0
SL
C3I
A2.
FK
BP
I5
SL
C3
lAI.
C0
02
6S
LC
3IA
I,C
DC
26C
dc26
Cdx
26S
LC
3IA
I,C
DC
26
Hum
anC
lust
ec94
Ch
imp
anze
eM
ou
seR
atD
og
hch
r9p
chr9
mch
,2r
chr3
Cch
r9
FG
POO
L,
RC
3H2
DF
OF
0PO
OL
,R
C3H
2D
FO
F0
P40
.R
c3h2
DF
OF
0Pd
cl,
Rc3
h2D
FO
F0
POO
L,
RC
3H2
DF
O
ZN
F48
264
•ZN
F4
82
B4
-Z1
p4
82
64
.Ztp
48
2B
4-Z
NF
482
84
-
ZB
TB
2G64
-ZB
TB
26
84
-Zbtb
26
63
-Zbtb
26
64
-ZB
TB
26
64
-
F0
RA
B0A
PI,
0PR
21F
0R
AB
OA
PI,
GP
R2I
F0
Rab
gap
l,G
pr2l
F0
Rab
gap
l,0p
r21
F0
RA
BO
AP
I,0P
R21
Hum
anC
lust
er70
.1C
him
pan
zee
Mouse
Rat
Dog
hchrl
ûp
ch
rlo
KG
L0C
6463
52,A
NK
RO
3QA
OF
OKG
LOC
6-IO
352A
NK
RQ
3OA
DF
O
L0
C219752
L0
02
19
75
2
ZN
F248
K8
-Z
NF
248
K8
-
BA
775A
3.1
-,
L0C
466277
K12
-
BA
393J1
6.4
*Z
NF
33A
K16
ZN
F25
612
-Z
NF
37A
K12
ZN
F33
AK
16*
ZN
F37
AK
12
F0
L0C
646419
F0L
0C
646419
L0C
646423,L
0C
646426
LOC
6464
23,L
00646426
125
Tad
epal
lyet
Page 144
DD
hch
rl0
pch
rfo
DF
0F0
.C
CN
FL2
MG
CI6
291
LO
C4O
164
2Z
NF
37B
ZN
11
B
DF
C
-16
-
K8
-
K16
-
F0.
CC
NY
U
MG
C16
291
L0C
4504
11-
--
L0C
7401
09-
8-
F0
DU
XA
P3
6IS
1.
RfT
Hum
anC
lust
er70
.2C
him
van
zee
Mouse
Rat
Doq
F0
DU
XA
P3
8115
1.R
OI
Hum
anC
lust
er70
.3
hch
rl0
D
Ch
imo
anze
e
pchrl
o
DF
0
Mo
use
FGG
AL
NA
CT
’PR
ASO
EFI
A
FX
YD
4,H
NR
PF
ZN
F48
7K
3
ZN
F23
9-
9
ZN
F48
5K
11*
ZN
F32
7-
mch
r6
DF
CF
0G
AL
NA
CT-
2.R
ASG
EFI
A
FX
YD
4,H
NR
PF
L0C
7453
36Z
NF
239
L0C
7454
99Z
NF
32
Rat
F0,
HN
RPA
3PI
CX
CL
I2
F0’R
AL
g1t1
A,F
.AA
DF
O
Hr,r
pf
Zfp
239
.7
Ztp
637
-7
K-*
-9
-
K11
*
-7
-
Doq
tchr4
0.
RA
SLI1
Ia.
RA
ySA
DF
O
lIlr
pI
Ztp
637
-7
F0.
HN
RPA
3PI
CX
CL
I2
CC
fl2
5
F0.0
%L
NA
CT
-2,
RA
5GE
F*
DF
C
FX
YD
4,H
NR
PF
F0.’
H,I
Ipa3
pI
Cxc
II2
GH
nrpa
3pl
xc11
2
F0.
HN
RPA
3PI
CX
CL
I2
Hum
ançl
usc
er1
7.1
cnim
pan
zee
Mouse
Hat
UO
g
hchrl
lp
ch
rll
mch
r7tc
hrl
ccfl
r2l
F0
OR
IOA
2O
R2D
2D
FD
F0
0R
10
A2
0R
2D
2D
FO
f0O
rIO
a2D
FO
F0
0r1
0a2
DF
OF
00
R1
0A
2.0
R2
02
DF
O
0R2D
3.O
RID
A4
0R
203
OR
IDA
40
r2d
20r2
52
0R2D
3O
RID
A4
ZN
F21
56K
4*
ZN
F21
55K
4L
0C48
5362
-4
*
ZN
F21
46
ii-
ZN
F21
4K
11.
L0C
4853
63K
11-
F0
NL
KP
I4,
HN
RN
PG
-TFG
NL
KP
I4,
HN
RN
PG
-TF
0N
IkpI
4FG
NIk
pI4
F0’N
LK
PI4
,H
NR
NP
O-T
SYT
9$Y
T9
SyI9
Syt
sSY
T9
126
Tad
epal
lyet
.
Page 145
DD
D
Hum
anC
lust
er17
.2C
him
pan
zee
Mo
use
Rat
Dog
hch
rlpchrl
mchrl
lrc
hrl
Occhrl
6
FG0R
5B12
,0R
5821
DF
OF
G0
R5
B1
2,
0R5B
21D
FO
F0
0r5
b12
DF
OF
G0r5
b12
DF
OF
Gr0
R58
12,
0R
5827
DF
O
LPX
NL
PXN
Lpx
nL
pL
PXN
ZF
P91
-4
•ZF
P91
-4
.Zfp
9l
-4
+Z
fp9l
-4
•L0C
475962
-4
*
ZF
P91
-CN
TF
--
*
F00
CY
AT
FG
0LY
AT
FG
0Iy
atF
0O
yal
FG
0LY
AT
CL
VATL
20L
YA
TL
2G
frat
/2G
fraf
l20L
YA
TL
2
Hu
man
Clu
ster
72.7
hch
rl2
Ch
imo
anze
e
pchrl
2
F0
L0
C4
02
39
l
Mouse
mch
r5
DF
CF
0C
HFR
ZN
F6O
5Z
NF
26Z
NFB
4Z
N11
40L
0C44
0122
ZN
F1O
ZN
F26
8F
0E
*d0
fC
h,om
som
e
DF
0
K17
-
K13
,
K19
+
K10
+
F6-
K11
-24
FC
Chf
r
ZN
F26
ZN
F84
ZN
F14O
ZN
F26B
Rat
rcf
lrl2
DF
0
K13
-
K19
*
K21
*
K24
*
GC
hfr
Doq
DF
0
FG
E*d
0fC
hro
no,e
Hu
man
Clu
ster
73.1
Ccf
lr5
F0C
HF
R
L0C
4862
20K
17-
L0C
4862
19K
13-
DF
C
ncnrl
3
F0
End
0f
Chrn
r,om
e
Ch
imp
anze
e
pcf
lcl3
GE
,d0fC
hro
,rnnn,e
Mo
use
F0
OA
CH
ID
FC
0/2
3
KL
F5
-3
KL
F12
-3
-
F0
OC
O-
172.
BIM
DK
TO
C1b
4
mcf
lrl4
F0
DA
CH
1
0/0
3
KL
F5K
LP1
2FG
GC
G-1
72.
BIM
D6
TOC
1b4
F0
End
0f
Ch,
onn,
omn
DF
C
.3
.
-3.
Rat
rchrl
5
Dog
FO
Dac
hi
DF
OO
is3
Kif
s-
3*
1<11
12-
3-
F0
Gcg
-/7
2.O
nrdG
Obc
lb4
Cch
r22
‘00*
011
DF
31,3
(1f5
-3
KI1
12-
3
‘0O
cg-
I72,
Bin
dK
TOc
1b4
CF
0D
AC
HI
0/0
3
KL
F5K
LF1
2F
0G
C0-
/72.
8/04
06
TB
CIb
4
DF
0
-3
*
-3-
Hu
man
çlu
ster
9J.
zunim
pan
zee
Mo
use
Har
tiog
hchrl
3p
ch
rl3
mch
rl4
rchrl
5cch,2
F0
TM
9SF2
,C
LV
OL
DF
OF
0T
M9S
F2.
CL
VO
LD
FO
F0:
T,n
95f2
,C
lybi
DF
OF
0T
,n9s
f2,
Cly
biD
FO
F0
TM
9$F2
,C
LVO
LD
FO
ZIC
5-
4-
ZiC
5-
4-
ZC
5-
4-
Zic
5-
4-
ZiC
5-
4-
ZIC
2-
4*
Z1C
2-
4*
Z1c
2-
4+
Zic
2-
4*
ZIC
2-
4*
F0
PCC
A,
RP
S26
LF
0PC
CA
,R
FS
26L
F0
Pcc
a,R
pn26
1F
0P
cca,
Rp4
261
F0:
PC
CA
RP
S26
L
TM
TC
4T
MT
C1
Tm
tc4
To/4
n4T
MT
C4
127
Tad
epau
yet
.
Page 146
DD
D
Hum
anC
lust
er74
.1C
him
pan
zee
Mo
use
Rat
Dog
hchrl
4pchrl
4m
ch
rl4
rch
rl5
cchr8
F0
MY
H6,
MY
H7,
N0D
ND
FO
F0
MY
HG
,MY
H7,
NG
DN
DF
OF
0M
yht,
Myh7
DF
OF
0M
yhF
Myh
7D
FO
F0.
MY
H6,
MY
H7N
0DN
DF
O
Ngd
Ngd
ZFH
X2
H4
-ZF
HX
2H
4.Z
fhx2
H4
-.Z
fhx2
H4
-L0C
490613
H4
-
ZN
F4O
9-
1Z
NF4
O9
-1
F0
TH
TPA
,A
PIG
2F
0T
HT
PAA
PIO
2FG
Thtp
a.%
,1g2
F0
Tht
pa,
Ap1
42F
0T
HT
PAA
P10
2
aPI-
14JP
I-14
Jph4
Jph4
JFH
4
Hum
anC
lust
er74
.2C
him
pan
zee
Mo
use
Rat
Dog
bchrl
4pchrl
4m
chrf
l2rc
hr6
cchr8
F0
ESR
2.M
TH
FD
ID
FO
F0
ESR
2.M
TH
FO
ID
FO
F0-
Es,
2,M
IhI-
diD
FD
F0
Esr
2,M
CI-
dlD
FO
F0
ES
R2,
MTH
FO
ID
FD
AK
AFS
AK
AP5
Ak4
p5A
kap5
AK
AP5
ZB
TB
25B
2-
ZB
TB
25e
2-
Zbtb
25
B2
-Z
btb2
5B
2L
0C
490730
B2
-
ZB
TB
1B
2•Z
BT
BJ
B2
,Zbtb
lB
2,Z
btb
lB
2•L
0C
490731
B2
F0
HS
PA
2.N
UP
5ÛP
IFG
HS
PA
2.N
UP
5OP
IF
0H
spa2
Nup
SO
pIFG
Hsp
a2N
upSO
pIF
0H
SP
A2,
NU
P5O
PI
PL
EK
H03
,S
PT
BP
LE
KH
03,
SPT
BSp
IbSp
ISPL
EK
HG
3.SP
TB
Hum
anC
lust
er75
.7C
him
pan
zee
Mo
use
Rat
Dog
hchrl
pch
rlm
ch
rll
rchrl
ûcchrl
6
F0
WH
DC
I.H
QM
ER
2D
FO
F0
WH
DC
I.H
QM
ER
2D
FO
F0
Whd
zI.
Hom
e,2
DF
OF
0-W
hdcl
.H
o,,l
e,2
DF
DF
0I-
W-1
001.
HO
ME
R2
DF
D
FAPA
IO3A
IFA
MIO
3AI
F4,,,
103A
1F
amIO
3AI
FAM
IO3A
I
BT
BD
1B
..
BT
BD
1B
--
Btb
dl
B-
-B
tbdl
B-
-
BN
C1
-3
-B
NC
1-
3B
ncl
-s
.B
ncl
-3
-
F0
SH
30L
3,A
OA
MT
SL3
F0
SH
30L
3.A
DA
MT
SC3
F0,
Sh3g
13,
Ada
F20
3F
0,S
h3pI
-3,
Ada
mIs
I3F
0SH
3OL
3.A
OA
MT
SC3
Hum
anC
lust
er15
.2C
him
pan
zee
Mo
use
Rat
00g
hchrl
5h
ch
rfS
mchr7
rchrl
cchr3
F0.
t0C
34
03
C2
L0
C3
84
16
3D
FO
F0
LÛ
C1-
1030
2,C
0C35
463
DF
OF
0C
OO
4IO
IC2C
OC
3&51
63D
FO
F0
C0C
Uc2
L0c3
88I5
3D
PD
DF
O
FL
]40113.E
2Q
2F
LJ4
0II
3F
202
F13
4011
3.F
202
FL
1401
l3,
5202
FL
J4O
II3E
202
ZF
P29
s14
•Z
SC
AN
2s
14Z
scan
2S
14L
0C68
6118
s-
-L
0C48
8749
S21
SC
AN
D2
s-
•S
CA
ND
2-
..
Zfp
592
-4
Ztp
592
-4
-L
0C48
8751
-4
-
ZN
F59
2.
4-
L0C
4536
08F
0A
LPK
3.SI
-020
41.
F0
AL
PF3.
0C02
04I-
,F
0A
LPK
3,dL
0204
1.F
0A
LPK
3,SL
C2O
A1.
F0
ALP
K3
SLC
2OA
I,
PDEO
AP
0524
FDEO
AP
0004
PO
E8A
128
Tad
epal
lyet
Page 147
D
Hu
man
Clu
ster
75.3
Ch
imp
anze
eM
ouse
Rat
Dog
hchrf
5pchrJ
5m
chr7
rch
rlcchr3
F0
ME
SP
2,
AN
PE
PD
FO
F01
1050
2.A
NPE
PD
FO
F0.M
ESP
2,A
1PE
PD
FO
F0
ME
SP
2,A
NP
EP
DF
OFG
ME
SP2.
AN
000
DF
O
AP3
S2A
P3S2
AP3
52A
P3S2
AP3
52
L0
C3
90
63
6-
2*
ZN
F71O
-11
*
ZN
F77
4-
12*
F0
ID0A
PI.
CR
TC
3F
0IQ
OA
PI
CR
TC
3F
0IQ
OA
PI.
C5T
C3
F0
IDG
AP1
,C
RTC
3F
OJD
GA
PI,
CR
TC
3
OLM
.FU
RIN
8L11
,FU
R1N
OU
FF
OR
tSB
LM.
PUR
iNO
UI
FU
FIS
Hu
man
Ciu
ster
16.1
D
Ch
imp
anze
e
pC
ht7
6
Mouse
mC
hr77
ncn
ri&
F0
TH
OC
SM
MP
25D
FO
MM
PL1.0
32
ZN
F2O
6s
14-
ZN
F2O
5K
8
ZN
F21
36K
5
ZN
F20
0-
5
ZN
F26
39K
9
ZN
P75
AK
5,
ZN
F43
4s
6-
ZN
F17
4S
3
ZN
F59
7-
7
F0
FL
JI.i
lStL
OC
UiC
I7.1
L0C
35
02
21
.CL
UA
P!
00
03
Rat
F0
TH
OC
6,M
MP
25D
FO
MM
PL
I,iL
32
L0C
7476
775
14
ZN
F2O
5K
8
ZN
F21
3so
5
ZN
F20
0.
5-
ZN
F26
356
9+
L0C
4538
615K
5
ZN
F43
45
6-
ZN
F17
45
3*
L0C
4678
89-
7
F0.
FU
l315
4.t0
C64
671
t0C
390671.C
LU
AP
I.N
OD
3
Doq
F0
iSO
iS,
Mm
p2S
DF
O
514
K8-
SIS
5
K19
-
K-+
K11
*
K13
t-
K10
*
rC
tJrl
U
*O.F
hoK
6,M
mp2
5D
FD
Ztp
206
514
*
Zfp
l3-
8
Znt
213
55
Zfp
263
s9
-
nh
17
4s
-
fp5
97
7*
Zip
206
Ztp
l3Z
tp21
3Z
fp4D
L0C
5451
91L
0C43
3078
1300
0038
13R
ik
Ztp
758
6330
4162
0F
0C
iuap
l
10*0
3
Cch
r6
F0
tOC
l900
*7
L0C
4900
46
L0C
4900
45L
0C47
9870
L0C
4900
40L
0C49
0042
L0C
4900
35
DP
O
s15
K13
-
SK9
ss.
06
-
‘GC
luap
i
5*03
F0
L0C
6094
56
L00175
5,L
OC
.t57
733
Hum
anC
lust
er76
.2C
him
pan
zee
Mouse
Rat
Dog
flchrl
6p
ch
rl6
mch
rl6
rchrl
OC
chr2
O
FG
AD
CF
9.SR
LD
FO
FGA
DC
YF
SAL
DF
DF
0A
dcy9
.5*
]D
FO
F0
Adc
y9,
54D
FO
FGA
DC
Y9.
SRL
DF
O
TFA
P4T
FAP4
lïap
4T
(ap4
TFA
P4
GL
IS2
-4
GL
IS2
-4
-G
11s2
GIi
s2_p
red
L0C
4900
28
ZN
P50
0s
5-
ZN
F50
0s
s-
F0
SE
PT
I2.R
OO
DI
FGS
EP
TI2
,R000i
F0
S6p1
12F
0S
eptl
2F
0S
EP
TI2
,R000i
Rag
d,R
ogdi
129
Tad
epal
lyet
Page 148
DD
D
Hum
anC
iust
er76
.3C
htm
pan
zee
Mouse
Rat
Dog
hchrl
6p
ch
rl6
mchr7
rchrl
cchr6
FO
CD
2BP
2,T
BC
IDIO
BD
FO
FG
CD
2SP
2,T
BC
1DIO
BD
F0
F0C
d2bp2,T
bcl
dlO
bD
FO
FG
Cd2bp2,l
bcl
dlO
bD
F0
FGC
D2B
P2,
TB
CID
IOB
DF
D
MY
LP
F.S
EP
TI
MY
LP
FS
EP
TI
MyI
pSep
t1M
ylpf
Sep
tlM
YL
PF,S
EPT
I
ZN
F55
3-
12L
0C74
0214
-12
*Z
tp55
3-
12R
GD
1561
639
L0C
4899
01-
12-
ZN
F76B
10-
L0C
4640
44-
8•
Zfp
771
-8
*R
GD
1305
903
L0C
489906
K17
*
ZN
F74
7K
--
L0C
4543
67K
14-
Zfp
768
-w
-L
0C69
1885
L0C
489908
K2
*
ZN
F76
4K
7.
C0C
4540
46K
7.
643O
6O4K
15R
1kK
-L
0C69
1887
L0C
6075
01K
-*
ZN
F68B
K2
L0C
4679
502K
991
3001
9022
Rik
K13
Zfp
6BO
L0C
489910
K11
ZN
F18
5K
7.
L0C
4679
51.
--
E43
0018
]23R
ik.
g.
Hit3
9L
0C48
9911
K9
*
ZN
F68
9K
11Z
NF
629
-19
.Zf
p764
K9
.Z
tp62
9L
0C48
9914
-19
ZN
F62
919
-L
0C46
7958
.15
.Z
tp68
8.
-Z
fp66
8L
0C48
9920
-16
ZN
F66
8.
w.
ZN
F64
6-
29-
Zfp
689
K2
.L
0C48
9922
-28
ZN
F64
629
*Z
tp62
9.
19-
Zfp
668
-16
6820
420M
01.
27
F0
VK
OR
CIL
0C
64
70
97
F0
L004679598C
K0K
F0
Vko,C
l8*kdk
F0
Vko
,C1B
ckdk
BC
KD
KO
IYS
TI.
FR
SS
8F
R5
28
My
sIl,
Prs
s8M
ystl
,Frs
sB
Hum
anC
lust
er76
.4
hch
rl6
Chim
anzee
pch
tl6
DF
0F
0P
PP
2C
BP
VN
IR3
L0C
342426
ZN
F26
7F
0L
0C64
7126
.
L0
C3
88
24
8
Mou
sem
cOrS
.12
K13
*
ES
FP
P2C
BP
VN
IR3
C0C
7432
74Z
NF
267
F0
L0C
6471
26,
L0C
3882
48
DF
O
.12
K14
—
Rat
rcO
n
DF
0F
0P
pp2c
bp
011*
3
Ztp
267
F0
L0C
6471
26.
L0C
3882
48
Doq
K14
*
F0
Fpp
2cbp
DF
O
V,1
1r3
Zfp
267
K14
F0
PF
P2C
BP
VN
1R3
DF
O
*GL
0C64
7126
,
L0C
3882
38
F0
C0C
647l2
6.
L0C
3882
48
Hum
anC
lust
er76
.5C
him
pan
zee
Mouse
Rat
Dog
hchrl
6p
ch
rl6
mch
r8rc
hrl
9cch
r5
F0A
BB
AI,
VA
C14
DF
OF
OA
BB
AI,
VA
CI4
DF
DF
0A
bbal
,V
2c14
DF
OF
0A
bbal
,V
aC14
DF
QF
OA
BB
AI,
VA
CI4
DF
O
HY
DIN
,C
AL
B2
HY
DIN
,C
AL
B2
Hyd
ftl,
CaI
b2H
ydrn
,C
aIb2
HY
DIN
,C
AL
B2
ZN
F23
K17
.L
0C46
8017
K17
-Z
tp61
2K
16*
Zfp
612_
pred
K16
*L
0C
489720
K17
-
ZN
F19
K10
.Z
NF
19K
10L
0C48
9721
K10
FG
-CH
ST
4.T
AT
FO
CH
ST
4,T
AT
F0-C
hst
4F
0C
hst4
FO
CH
ST
4.T
AT
MA
RV
EL
D3
MA
RV
EL
Q3
Tat,
Ma*
,&d3
Tat
,M
a,*&
d3M
AR
VE
LD
3
130
Tad
epal
lyet
Page 149
DD
DH
um
anC
lust
ar76
.6C
him
pan
zee
Mou
seR
atD
og
hchrl
6p
ch
rl6
mch
r8rc
hrl
9cchr5
F0
SL
C7A
5CA
5AD
F0
F0
SLC
7AS
CA
OA
DF
DF
0S
Ic7a
5.C
a5a
DF
0F
0S
Ic7a
5C
aSa
DF
0F
0C
A5A
DF
0
eN
P840F
Bae
p8
a,,
BA
NF
ZN
F64
9-
3Z
NF
649
.3
eG
m22
4-
L0C
6914
99-
4L
0C48
9666
-4
-
ZFP
M1
-2
.-Z
FP
M1
-2
*Z
fpM
l-
2eL
OC
6915
O4
-2
*
FG
NH
NI,
ILI7
CF
GN
HN
I,1L
17C
F0
Nhel,
t117
cF
0N
hel,
1117
*F
0N
HN
I,L
I7C
CY
64C
YB
AC
yba
Cyb
aC
YB
A
Hum
anC
lust
er77
.7
hch
rl7
Ch
imp
anze
e
F0K
?F1C
0P
R1728
D
pchrf
7
Fo
Mo
use
FG-K
IF1C
0P
R1
72
8D
F
mchrl
i
ZF
P3
ZN
F23
2Z
NF
594
F0
UN
0578
3,R
AB
EP
1
NU
P88
-3
,
S3-
-22
-
o
I’ta
t
L0C
4684
55L
0C45
5260
ZN
F59
4F
0U
N05
783R
AB
EP
I
NU
F8S
FG
KS
IcD
FC
Gpr
1728
Z1p
3-
13-
13
55
-32
-
Hum
anC
lust
er77
.2
Doo
hch
rl7
rch
riO
F0
641e
Gpr
l72
8
RG
D1
5658
81-
13e
DF
0
cch
r5
FG
KIF
IC.
0P
R1
72
8
F0
64q5
783
Nue
88
Ch
imn
anze
e
pchrl
7
DF
C
DF
O
U80
WP
V2
SN
OR
D49
B
ZN
F28
7Z
NF
624
FG
RN
AS
EH
IP2
L0C
4894
52-
13
F0
Ue4
578
3
64r,
88
Mouse
mch
rll
SK14
-
K21
-
U8B
TR
PV
2
5N
0R
0498
ZN
F28
7Z
NF
624
F0
RN
AS
EH
IP2
F0
UN
0578
3.R
AB
EP
I
NU
P88
OF
D
SK14
-
K21
-
Rat
D80
F0
LO
b
Trp
e2
Ztp
287
F0
Rea
s&al
p2
Doq
rchrl
O
F0
LOb
Tp
e2
Ztp
287
0R
nas
ehfp
2
DF
0
Cch
r5
U8B
TR
PV
2
SN
OR
D49
B
ZN
F28
7
F0
RN
AS
EH
IP2
DF
O
Hu
man
Clu
ster
78.1
Ch
lmp
anze
eM
ouse
Rat
Dog
hchrl
8p
ch
rl8
mchrl
8rc
hrl
8cchr7
F6
N0L
3.O
TN
AD
FO
FG
NO
L4O
TN
AD
FO
F6
1641
64*
DF
OF
6N
e*.
01*3
DF
OF6
.L
OC
ISO
I9I
DF
O
MA
PR
E2
MA
PRE
2M
apre
2M
apre
210
*190
13*
IOC
*901
89
ZN
F39
7s
-•
C0C
4553
70s
9*
Ztp
397
s9
Znt
397
S-
-L
0C49
0486
S7
*
ZN
27
1-
s.
L0C
4553
72s
7.
C23
0097
124R
1ks
--
2fp2
39-
18*
L0C
4801
62s
4*
ZN
F24
s-
L0C
4685
20-
20*
Zfp
35-
18*
Ztp
l9l
S4
-
ZN
F39
6S
2-
ZN
F24
S4
Zfp
l9l
S4
ZN
F39
6s
2-
Fo
GA
crIr
,.L
0C63
8532
F6
64
LNT1
,L
0C53
0532
F0
0*1,
11,
FGG
alet
IFG
L0C
48*1
6l
PI5
RS
P15
85P1
501
PI5
rst0
0480
160
131
Tad
epal
lyet
Page 150
DH
um
anC
lust
er78
.2C
him
pan
zee
Mouse
Rat
Dog
hchrl
8pchrl
8m
ch
rl8
mch
rl8
cch
rl
F0
TS
HZ
I.L
0C28
1274
DF
OF
0T
SH
ZI.
L0C
2842
74D
FO
F0
T552
1D
FO
F0T
shzl
DF
OF
0T
SH
ZI,
L0
02
84
27
4O
FO
LOC
7266
62L
0C72
8662
L0C
7286
62
ZN
F5I
6-
7-
ZN
F51
6-
7.
Zfp
516
.z
-R
GD
1306
817
7-
L0C
4839
30-
7-
ZN
F23
6-
25*
ZN
F23
625
*Z
fp23
6-
asZ
fp23
6-
36*
L0C
4839
29-
30
F0
516F
,G
AL
RI
F0-
518F
,0A
LR
IF
0-M
bp,
Gîr
1F
0M
bp.
Gaf
rlF
0.M
BP,
GA
CR
I
Hum
anC
Iust
er79
.1
bchrf
9
F0
AT
PD8B
3,R
EX
OI
Ch
imp
anze
e
pch
r
)
DF
DF
0A
7PD
883
RE
XO
I
KL
F16
BT
BD
2F
0M
KN
K2,
MO
DK
C2A
Mouse
.3
B-
-
OF
C
mdIr
lO
F0
65,6
853.
Rex
ol
KL
FI6
BT
BD
2F
0M
KN
K2.
MO
BK
UA
Ilat
-3
.
e-
-
DF
0
tch
r7
‘0A
tpdS
b3R
exo?
Kif
16B
tbd2
F0
MFr
,k2,
Mob
kl2a
Po
CC
htl
6
-3.
B--
L0C
6908
20G
D1
5660
94‘0
Mko
k2.
Mob
kI2.
,
DF
0
-3
B-
Hum
anC
luS
ter
79.2
chlm
pan
zee
Mou
seR
atD
09
hchrl
9p
ch
rl9
mch
rlO
rchr7
cch
r2û
F0
SCC
39A
3.80
TA
DF
OF0
.sL
czaA
3.sG
rAD
FO
F0
SOD
DF
OF
08g
OD
FO
F0SL
C39
A.3
S0T
AD
FO
TH
OP
IT
HO
PI
Tso
piT
hopI
TH
OP
I
ZN
F55
4K
7+
ZN
P55
4.
z*
ZN
F55
5K
15+
ZN
F55
5K
15
ZN
F55
6K
g+
ZN
F55
6g
ZN
F57
K13
+Z
NF
57
K13.
ZN
F77
K12
-Z
NP
77K
12
F0
TLEO
F0
TLEO
F0
TleO
F0
0x6
F0
TL
E6
TL
E2
TCE2
T1e2
0e2
TCE2
132
Tad
epal
lyet
Page 151
D
Hum
anC
lust
er79
.4
Hum
anC
lust
er79
.3C
him
pan
zee
Mo
use
Rat
Dog
hchrl
9p
ch
rl9
mchrf
lrc
hrl
Qcc
hr2
O
F0
M80
3L2
DF
OF
061
0031
2D
FO
F0-
Mbd
312
DF
O00
Mbd
3I2
DF
0F
0M
803L
2D
FO
ZN
F55
7K
10•
ZN
F55
7K
10•
ZN
F55
7K
10*
ZN
F35
8.
9•
L0C
455655
.g
*Z
NF
358
-9
*
F0
MQ
QL
NI,
PN
PL
AF
F0
MC
OL
N1P
NP
L4F
FGM
coIn
lF
0M
coI1
FG
MC
0CN
I,P
NP
LA
G
01M
153.
PC
P2
01
M1
53
PC
P2
Pcp
2P
cp2
01M
15
3P
CP
2
hch
rl9
Chim
pan
zee
pchrl
9
D
‘0V
AV
IEM
RI
DF
C
EM
R4
NF
557
ZN
F35
8‘0
1.1C
OtN
I,P
NF
MS
04
41
53
3.
XA
B2
Mouse
010
-
-9
*
DF
O‘0
VA
VI.
EM
RI
6MR
4
ZN
F55
7Z
NF3
5B‘0
.M
CO
LNI.
PN
PM
S
KIA
A15
43,
X482
InC
nn
?
FO
Vav
lD
F0
E,,,
4
Rat
K10
+
-9
,
Dog
rcnr
9
FG
Vav
ID
FC
E,1F
4
ccn
r2O
‘GM
coft
ll
Fab
2
F0
Mco
I,1
Xab
2
F0.
VA
VI,
EM
RI
DF
D
EM
R4
ZN
F55
7K
10*
F0.
MC
OLN
I,FN
FL4G
OA
A15
43.
X4B
2
Hu
man
Clu
ster
79.5
Chim
pan
zee
Mo
use
Rat
Dog
hchrl
9p
chrl
9m
chrl
7rc
hr8
Cch
r2û
F0
MA
RC
H2.
HN
RP
.4I
DF
OF
0M
4RC
02,H
NR
FMD
FO
F0.
F5r
pmD
FO
F0.
Hnr
pmD
FO
F0
514R
CH
2,H
1RP
MD
FD
PRA
M1
P00
611
P,4
,n1
Pm
ml
PRA
M1
ZN
F41
4i
.L
0C74
3829
-.
.Z
fp41
4-
2•
Zfp
lOl
K15
L0C
6116
36-
1-
ZN
FS
5BK
9-
L0C
4687
04K
9-
Zfp
lOl
K15
-Z
fp8l
K13
-Z
NF
558
K9
-
ZN
F31
7K
13+
ZN
F31
7K
13-
Zfp
Bl
K13
-
ZN
F69
9K
16-
L0C
744819
K5
-
2NF
559
K11
+Z
NF
177
K7
+
ZN
F17
7K
7+
ZN
F26
6K
14-
ZN
F26
6K
14Z
NF5
6OKO
14
ZN
F56O
KK
14Z
NF
426
K12
-
ZN
426
K12
ZN
F12
1-
10
ZN
F12
1-
10-
ZN
F56
1K
10-
ZN
P5G
1.
w-
L0C
455686
K11
-
ZN
F56
2-
o-
L0
C7
29
64
8-
--
LO
C1
62993
K12
-
F0.
UB
E2C
4F
OX
t:5814
F2
UB
E2L
3,FD
XC
.28
14
F0
Ube
214,
Fb*1
12F
0.U
be21
4Fb
x112
FG
5862
LK
FBX
LI,
58t5
P1
5J1
,OtF
612
PIN
IOL
F612
use,
F,s
lU
NS
,P
-ni
PIN
IOL
F612
133
Tad
epat
lyet
.
Page 152
DD
D
Hum
anC
lust
er79
.6C
him
pan
zee
Mou
seR
atD
og
flchrl
9p
ch
rl9
mchr9
rchr8
cchr2
O
FGR
GL
3,PA
KC
SHD
FO
FGR
GL
3,PR
KC
SHD
FO
FGP
,kcs
hD
FQ
FG
Prk
csh
DF
OF
Gt0
C48
4941
DF
D
EL
AV
L3
ELA
VC
3EI
aW3
EIa
43L
0C
484940
ZN
F65
3-
4-
ZN
F65
3-
4-
Zfp
653
-4
-Z
tp65
3-
--
L0C
4849
39-
4
ZN
F62
7K
11*
L0C
4687
23-
--
g5300l5
io7R
ikK
13*
L0C
611075
K15
*
L0C
401898
-6
L0C
4557
35.
--
Ztp
809
K7
*L
0C48
4934
K16
HS
ZF
P36
K16
-H
SZ
FP
36-
16.
BC
0500
92K
11-
L0C
4849
33K
17-
ZN
F441
-19
*L
0C46
8726
.3
-Z
fp8l
OK
12
ZN
F491
-13
*L
0C46
8727
--
ZN
F44O
K12
*Z
NF4
4OK
21*
ZN
F43
9K
Il*
L0C
7468
50-
3*
ZN
F69
K-
*L
0C46
8730
--
-
ZN
F70
0K
21L
0C46
8731
K19
-
ZN
F44
OL
KS
*Z
NF2
O8
K39
ZN
F43
3K
19-
ZN
F13
6-
1+
L0C
729747
K15
ZN
F44
K15
FL
J149
59K
6*
ZN
F44
3K
18-
ZN
F78
8-
16.
L0C
4687
33K
36-
ZN
F2O
K13
-Z
NF
564
K15
-
ZN
F62
5-
8-
ZN
F49O
K13
-
ZN
F13
6K
14+
L0C
4687
35-
16
ZN
F44
K16
-L
0C45
5740
--
-
ZN
F56
3K
8-
ZN
F44
2K
14-
ZN
F79
966
26-
ZN
F44
3K
19-
ZN
F7O
9K
19-
ZN
F56
4K
16-
ZN
F49O
K13
-
ZN
F791
K17
KL
F1.
3-
F0
MA
N2B
1,M
OR
GI
F0
MA
N2B
I.M
OR
GI
F0,
Man
2bl,
Mor
glFG
Ma,
2b1,
Mor
glFG
MA
N2B
I
DH
PS.
FGX
W9
DH
PS
FBX
W9
Dhp
sD
Sps
MO
RG
1
Hum
anC
lust
er79
.7C
him
pan
zee
Mouse
Rat
Dog
hch
rl9
F0
GM
IP
AT
PI3
AI
ZN
F1O
1Z
NF1
4OZ
NF5
O6
L0C
730008
ZN
F25
3Z
NF5
O5
ZN
F68
2Z
NF9
OZ
NF
486
FL
J448
94L
0C
163233
ZN
F62
6Z
NF
66Z
NF
85Z
NF4
3O
ivcn
re
F0
AIp
I3aI
D33
0038
006f
lik
9830
167
H1
8R1k
1200
0031
07R
IKE
G63
6741
A14
491
75010627
rchrl
6
F0
Atp
I31
Gm
,p
MG
C72
612
cch
r2O
DF
0
K10
.
K19
-
K8
-
9*
K3*
K17
*
K11
-
K15
+
K10
.
3.
K14
-
K-
-
-8*
K15
+
K12
*
pchrl
9
FGG
MIP
AT
PI3
AI
ZN
F1O
JZ
NF2
O8
ZN
F93
ZN
F91
ZN
F85
ZN
F43O
L0C
4558
91Z
NF4
31Z
NF
85Z
NF9
1Z
NF9
OZ
NF
429
ZN
F49
2Z
NF
100
ZN
F43
DF
0
K12
-
K7-
K12
-
K10
*
K9-
K13
+
DF
C
K9*
DF
O
K10
*
K36
-
K15
*
K37
*
K12
*
K14
*
K14
-
K12
-
K12
-
134
Tad
epal
lyet
,
Page 153
ZN
F71
4Z
NF
431
ZN
F7O
8Z
NF
493
ZN
F42
9Z
NF
100
ZN
F43
ZN
F2O
8Z
NF
257
ZN
F67
6L
0C
14
81
98
L0C
441
843
ZN
F49
2Z
NF
99L
0C
64
68
54
L0C
38
85
23
ZN
F72
4PZ
NF9
1Z
NF
725
ZN
F67
5Z
NF6
81L
0C
646895
L0
C7
30
08
4L
0C
73
00
87
ZN
F2S
4
F0
CT
P25
IU
QC
RF
SI
PO
P4.
PL
EK
HF
I
•12
K12
K15
K-
K17
K12
K22
K34
K12
K15
K13
K28
K30
K7
K35
K14
-16
-6
-6
K4
ZN
F2O
8L
0C46
8804
ZN
F43
2NF
93L
0C74
0583
L0C
4688
06Z
NF
675
L0C
4688
08L
0C74
0901
L0C
4559
07
F0
CT
P25I
,U
OC
RF
SI
PO
P4.
PLE
KH
FI
K36
-
K9
,
K16
-
K34
-
-5—
K14
-
K16
-
-5
*
F0
C1p
25,,
Uqc
rfsl
Pop
4
FGC
5,25
i,U
qcrf
sl
Pop
4
Hum
anC
lust
er79
.8C
him
pan
zee
Mouse
Rat
00
g
hchrl
9p
ch
rl9
mch
r7rc
hrl
achri
FG
PD
CD
2L
UB
A2
DF
OFG
FOC
D2L
,U8A
2O
FO
F0
U5a
2D
FO
F0
Uba
2D
FO
*G
tOC
G1
94
2L
0C
10
45
94
DF
O
WT
IPL
0C
28
44
02
WT
1PL
0C28
4402
Wtip
Wlip
LO
C4
7649
0
L0C
441
847
-9
-L
0C46
8824
--
-L
0C48
4590
K28
-
ZN
F3O
2K
7Z
NF3
O2
--
-L
0C48
4586
K13
ZN
FI8
1K
11Z
NF
181
K11
ZN
F59
9K
14-
L0C
4688
25-
14-
L0C
643825
S-
.Z
NF3
OK
16*
L0C
441
848
-1
-L
0C46
8828
K13
-
ZN
F3O
-18
ZN
F92
-13
-
F0
0RA
MD
1A,
SC
NJB
F0
GR
AM
DIA
,S
CN
IBF
0G
rrn
j1a,
Scn
lbF
0G
ra,r
41a,
Scn
lbF
0S
CN
NIB
HP
NF
XY
D3
HF
NF
X7
03
Hp,1
Hpr
lC
0C48
4585
D
135
Tad
epal
lyet
Page 154
DD
D
Hum
anC
lust
er19
.9C
him
pan
zee
Mouse
Rat
Oog
hchrl
9p
ch
rl9
mch
r7re
lui
celu
i
FG
CO
XB
BIU
PK
IAD
FD
F0L
0045
5974
L00
4559
70D
FO
FD
Co
tGb
l,U
kp
laC
kap
lD
FD
FD
Co
x6
bL
Uk
pla
,Ck
apl
DF
OF
DL
0C
012634L
0C
444579
DF
O
CK
AP
I,C
AP
NS
I,C
OX
7AI
LD
C46
8844
Cap
ttl.
Co
x7
a1
Cap
nsl
Cox7a
I
TZ
FP
B2
tL
0C45
5976
K12
-Z
btb3
2B
2-
Zn1
382
K10
+L
0C48
4577
B2
-
ZN
F56
5K
12-
L0C
4559
77K
11-
Z1p
146
-10
-zZ
fp26
O-
13+
L0C
4845
64-
3-
ZN
F14
6-
10t
L0C
4559
78K
23t
593O
415A
O9R
ikK
9R
0D15
6323
9K
33-
L0C
4845
61K
27+
ZF
P14
K13
L0C
4688
46K
23+
Ztp
260
-12
+R
GP
1560
682
K11
+L
0C48
4559
KID
t
ZN
F54
5K
13-
L0C
455979
2K34
-Z
fp56
6K
7-
Zfp
569
K7
L0C
4845
48K
12t
ZN
F5G
67
-L
0C46
8848
K10
+Z
fpB
2K
13-
Z1p
74K
18-
L0C
4845
47-
29-
ZF
P26
O-
13-
L0C
4559
83K
13Z
fpl4
K13
-L
0C49
9120
K39
-L
0C48
4545
K35
-
ZN
F52
9-
ii-
ZN
F2O
SK
39t
Z1p
568
2K11
Ztp
84K
23-
L0C
4845
42K
31t
ZN
F38
2K
10+
ZN
F56
7K
14-
L0C
6254
21K
-L
0C48
4541
K15
+
GIO
T-1
K12
-Z
NF4
61K
12Z
1p74
-1
L0C
4845
40K
22-
ZN
FS
67K
15t
ZN
F38
2K
10-
Zlp
383
-11
L0C
4845
38K
33-
L0
C3
42
89
2K
32-
2NF
529
K11
tZ
fp27
K22
-
MG
C62
IOO
K13
-Z
NF2
6O13
tB
23O
312I
18R
Ik-
+
ZN
F34
S-
15t
ZN
F56
6K
Bt
BC
0273
44K
10
ZN
F56
8K
12t
ZN
F54
5K
13+
6330
581L
23R
ikK
13t
L0
C6
53
28
4-
12+
ZF
P14
K13
+Z
lpSO
K13
t
ZN
F42O
K15
tH
KR
1K
13+
Z1p
84K
11t
ZN
F58
5AK
23-
ZN
F56
9K
39-
ZN
FS
85B
K23
-Z
NF5
71K
17-
ZN
F38
3K
11+
ZFP
3OK
13
HK
R1
K13
+L
0C46
8857
K15
-
ZN
F52
7K
12+
ZN
F6O
7K
20
ZN
F56
9K
18L
0C46
8859
K9
-
ZN
F57O
K11
+
L0C
390927
K5
-
ZN
F54D
K17
-
ZN
F57
IK
17-
ZFP
3OK
13
FJL
3754
9-
3-
ZN
F6O
7K
20-
ZN
FS
73-
19-
L0
C4
01
91
6-
B+
F0
NV
D-S
PI
lSIF
AIL
3F
0L
0c4
6s8
6o
,L0
04
68
86
1F
0493
04
32
EI
IR,k
F0
4930
432E
1IR
ISK
OL
00
46
45
37
DP
FL
PP
FIR
I4A
L0C
4688
62D
p13,
Ppp1
1149
DpO
,Ppplr
l4B
L0C
6128
00,L
00
48
45
36
136
Tad
epal
lyel
Page 156
DD
Hum
anC
lust
er19
.12
Chim
pan
zee
Mouse
Rat
Dog
mcn
n
F0
2210
4I2
EO5R
1O
L0
0434
161,
0433
BC
0433
01Z
fp71
949
3340
5K07
Rik
rch
nl
FQ22
104
l2E
050ik
L00
434
1010433
RG
D1
3095
64
DF
0
K22
+
K15
+
K11
+
D
DF
D
K11
+
pch
nl9
F0
SIG
LE
C12
,SIG
LE
CPI
1
SI0
LE
0F
SI0
LE
0P
I2
L0C
4562
51Z
NF
649
L0C
4689
81Z
NF
613
ZN
F61
5Z
NF
614
ZN
F43
2Z
NF
616
L0C
4562
61L
0C74
8970
ZN
F48O
L0C
4689
84L
0C46
8985
L0C
4689
88L
0C74
8568
L0C
4564
21L
0C46
8990
ZN
F7O
1L
0C74
8607
ZN
F60
0L
0C45
6267
ZN
F32O
L0C
4562
68Z
NF1
6OL
0C45
6426
L0C
4562
69L
0C45
6270
ZN
F46
8Z
NF3
31
hch
nl9
F0
510L
FC12
,OI0
LE
CF1
I
010L
E06
,SIG
LE
CP
I2
ZN
F17
5Z
NF
577
ZN
F64
9L
0C44
186
1Z
NF
613
ZN
F35O
ZN
F61
5Z
NF
614
ZN
F43
2L
0C28
4371
ZN
F61
6F
LJ1
6287
ZN
F76
6Z
NF4
8OZ
NF6
1OZ
NF
528
ZN
F53
4Z
NF
578
ZN
F8O
8Z
Nfl
O1
ZN
F13
7Z
NF
83L
0C
72
98
40
ZN
F61
1Z
NF
600
ZN
F28
ZN
F46
8Z
NF3
2OL
0C
38
85
59
ZN
F81
6Z
NF7
O2
ZN
F16O
ZN
F41
5Z
NF
347
ZN
F66
S
OF
O
K15
+
K10
-
K16
-
K2+
K17
-
Kil
-
K16
-
K21
-
K25
-
K10
+
K12
+
K17
+
-8
-
-11
+
K24
+
-4
+
•20
-
K12
-
K19
+
K12
+
DF
0
K15
+
K8
-
K10
-
K12
+
K6
’
K19
-
K11
-
K17
-
K21
-
K25
-
K10
+
K12
+
K9+
K15
+
K17
+
-12
+
-2
+
K9+
-5+
-15
-
K14
-
K17
-
-20
-
-15
-
K11
-
K12
-
-18
-
K15
-
-5
K20
-
K11
-
K20
-
K18
-
ech
ni
FG
L07
6II7
IFL
OC
47O
3S6
DF
C
L006I
l00
0,L
0C
61
1700
L0C
6116
92K
20+
L0C
4843
41K
27+
L0C
491
432
-9
-
L0C
6116
69K
12-
L0C
4843
38K
50-
L0C
4843
33K
12-
L0C
4843
31K
16+
L0C
4763
94K
17-
L0C
4822
73K
13-
L0C
6115
99K
10+
L0C
4807
82K
18+
L0C
6115
90K
4+
L0C
6115
83K
14-
L0C
4843
28K
14+
L0
04
84
32
6K
34+
L0C
4843
24K
17+
L0C
4843
23K
11+
ZN
F331
K12
-
138
Tad
epal
lyet
Page 158
DD
ZN
F17
5K
17+
ZN
F41
6K
12-
L0C
6659
130
4+
ZN
F74
9-
17÷
ZN
F211
K22
+Z
tp55
1K
13-
ZN
F77
2K
10-
ZN
F55
1-
--
Zfp
606
K15
+
ZN
F41
9K
11+
L0C
7415
20K
15-
Z1p
329
-12
-
ZN
F77
3K
9+
ZN
F25
6K
15-
Ztp
llO
SKK
5÷
ZN
F54
9K
15+
ZN
F6O
6K
16-
Zfp
128
K7
+
ZN
F55O
K8
-L
0C46
9048
s-
+Z
scan
22
58
+
ZN
F41
6K
12-
L0C
4563
40s
4-
Zfp
324
K9
+
ZIK
1K
9÷
ZN
FS
29-
12-
Ztp
446
s3
+
ZN
F53O
K13
ZN
F27
4s
+Z
btb4
58
4-
ZN
F13
4-
io+
L0C
4563
44-
--
Mzf
l5
13-
ZN
F21
1K
12+
ZN
F8
K7
+
ZS
CA
N4
54
+L
0C74
2219
ss
+
ZN
F55
1K
16+
ZN
F49
7-
14-
ZN
F1S
4K
10-
L0C
4563
46-
8-
ZN
FG71
K10
-Z
NF
584
K8
+
ZN
fl7
6-
10+
ZN
F13
2-
17-
ZN
F5S
6K
10+
L0C
7426
60K
-+
ZN
FS
52K
s-
ZN
F32
4-
17+
ZN
F5S
7K
13+
ZN
F44
6s
3+
ZN
F81
4-
--
L0C
4563
52-
--
ZN
F41
7K
13-
ZN
F42
s13
-
ZN
F41
8K
16-
ZN
F25
6K
15-
ZN
F6O
6K
16-
ZSC
AN
15
3+
ZN
F13
5K
16+
ZN
F44
75
2-
ZN
F32
9-
12-
ZN
F27
4+
5K0
+
ZN
FS
44K
13+
ZN
F8
K7
+
HK
R2
ss÷
ZN
F49
7-
14-
L0
C1
16
41
2-
6-
ZN
F5B
4K
8+
ZN
F13
2K
16-
ZN
F32
4BK
9+
ZN
F32
4K
9+
ZN
F44
6S
3+
ZN
F49
96
4-
ZN
F42
s13
-
ZN
F93
K17
-
FG
L0C
6537
69F
GL
0C65
3769
FGL
0C65
378
9F
GL
0C65
3759
F5
-L00
653
789
140
Tad
epal
lyet
Page 159
D
Hum
anC
lust
er20
.7C
him
pan
zee
Mouse
Rat
jbg
hch
r20
pch
r2o
mch
r2r
chr3
Jcch
r24
F0
SNX
5D
FD
F0
SNX
5D
FO
F0
Sex
5D
FO
F0
Sex
5D
FO
FGS
NX
5D
FO
PTM
AP3
PTM
AP3
Pt,
mp
3FS
nap3
PTM
AP3
ZN
F33
9.
4-
ZN
F33
9.
4-
Ztp
339
.4
Zfp
339
.4
.Z
NF
339
-4
-
ZN
F13
3K
15*
ZN
F13
3K
15*
Zfp
133
K15
*Z
fp13
3K
15
F0
POC
R3F
.R
BB
P9
F0
PDL
R3F
.R
BB
P9
F0
PoIr
3f.
Rbb
p9F
0Po
fr3f
.R
bbpg
F0
FQ
LR
3F,
RB
BP
9
Hum
anC
lust
er2
o.2
hch
r2o
FG
NE
VR
I2R
TS
4D
PL
TP
ZN
F33
5Z
NF
663
ZN
F33
4CG
SLC
I3A
3,T
P53R
K
SLC
2AO
,E
VA
2
FC
.13
-
K14
-
Mou
sem
chr2
G.N
ee42,R
S.
DF
PI
fp335
.13
fp334
K14
C0
Tp5
3rk
51c2
al0,
Eya
2
D
Mou
se
Inch
r2
OH
efaI
c2.
Atp
9aO
FD
3a11
4.
7
fp6
4.
13-
0E
rp2B
pMps
33p4
Rat
rchr3
F0N
evfl
2,R
i*.a
DF
C
PIF Zfp
335
-13
Zfp
334
K14
-
F0
Tp5
3ek
S1c2
a10,
Eya
2
Rat
rchr3
F0
Hef
atc2
,Ap
ga
DF
C
Z1p
64F
0E
,p2B
pMep
s33p
4
-13
Ch
imp
anze
e
pch
r2û
FGN
EV
RU
.R
TSA
PLT
P
ZN
F33
5L
0C74
2834
ZN
F33
4CG
.SL
CI3
A3,
TP53
RK
SLC
2410
,EY
A2
DF
C
.13
-
K14
-
Hum
anC
iust
er20
.3
Dog
hch
,20
cch,2
4
Ch
imp
anze
e
pch
r2o
F0
HN
FAT
C2.
AT
P9A
OF
C
C0N
El,
L2.
RT
SA
DF
D
PL
W
L0C
4859
04.
13
ZN
F33
4K
14
SAL
L4
ZP
P64
GE
RP
28P
,M
RP
S33
P4
F0
HN
FAT
C2.
AW
9AD
FD
.7.
-13
-
C3
SLC
I3A
3.TP
53R
K
SLC
2AIO
,E
YA
2
SAC
L4
ZF
P64
F0,
ER
P2B
F.M
RP
S33
P4
.7.
.13
-
Dog
cch
r24
F0.
HN
FA
TC
2AT
P9A
DF
D
-13
-
L0C
4859
31L
0C48
5932
0,
ER
P2S
P,
MR
PS
33P
4
num
ançI
usr
er1fl
.1n
Imp
an
zee
Mouse
Har
uog
hchr2
lpchr2
lm
ch
rl6
rchrl
lcchr3
l
F0
hMX
l.T
MP
RS
S2
DF
IF
0H
MX
I,T
MP
RS
S2
DF
DF
0H
,nel
,T
,rer
ss2
DF
OF
0H
,,eI,
irlp
rss2
DF
DF
0H
AIX
1,T
MP
RS
S2
DF
O
RIP
K4
RIP
K4
Rip
k4R
Ø3
PIPK
4
PR
DM
155e
14-
PR
DM
15S
e14
-P
rdm
l5S
e14
-P
rdm
l5S
e14
-L
0C61
0905
Se
14-
ZN
F29
5B
6-
ZN
F29
5B
6.
Zfp
295
B6
-Z
fp29
5B
6-
L0C
4877
75B
6-
F0
UM
DD
LI.
AB
C0I
F0
UM
OD
LI4
5C
01
F0
U,r
odll
.Abcg
lF
0U
evd
Il,A
beg
lF
0U
MO
DL
1,A
BC
GI
TFF
3,T
FF
2T
fF3
TFFG
Tff2
Tif
sT
FF3,
WF
2
141
Tad
epal
lyet
,
Page 160
DD
Hum
anC
luS
ter
22.1
Ch
imp
anze
eM
ouse
Rat
ag
hch
,22
pch
r22
mch
rlO
rch
rlO
Cchrl
6
F0
UM
OO
LIA
BO
GI
DF
OPC
UM
OD
LI,
AB
CG
ID
FO
F0
U,r
odIl
.Tf
f3D
FO
F0
U,d
i1,
Tff3
DF
OF
GU
MO
DL
1,A
BC
G1
DF
O
ÏFF
3T
FF3
TFF
3
SU
HW
2.
i-
SU
HW
2.
i-
Suhw
2.
1
SUH
W1
-1
.SU
HW
1.
i-
F0
fOL
V2-
34.
CL
V2-
33F
0fO
CV
2-34
.fO
LV
2.33
F0
Pc,m
1211
1F
0P
om
l2ll
lF
0fO
LV
2-34
,1C
CV
2-33
PQ
MI2
1Lf
BC
RL
3P
OM
I2IL
IBC
RL
4B
crl4
-Bc
H4
PO
MI2
1LI
BC
RC
4
Hum
anC
iust
erX
.7C
him
pan
zee
Mouse
Rat
Dog
hch
rXpchrX
mch
rXrc
hrX
cchrX
FOU
BE
1PC
TK
ID
FO
FO
UB
EIP
CT
KI
DF
OF0
.U
2X1.
ftfk
lD
FD
F0U
blx
,PC
lkl
DF
OF0
.UB
EI,
PCT
KI
DF
O
US
PI1
US
PII
Usp
llU
spll
US
PII
ZN
F15
7K
12*
ZN
F15
7K
12•
Ztp
lB2
L0C
6125
09K
12*
ZN
F41
K18
.Z
NF4
1K
18.
D93
0016
NO
4RIk
L0C
4808
99K
18-
ZN
F81
K13
*Z
NF8
1K
13*
L0C
491
863
K13
ZN
F18
26
15-
ZN
F18
2.
15-
L0C
4918
64K
14-
ZN
F63O
K13
-L
0C
473594
--
-L
0C49
1865
K13
F0
SSX
6os
iSSX
2F
032
X6
34,5
2X2
F0
Ssx
alF
05
3x
4I
F0
55X
6,p
s,S
SX
2
C0C
6533
1710
0653
317
Ssx
a2S
s,,a
2L
0OE
533I
7
Hum
anC
lust
erX
.2C
him
pan
zee
Mo
use
Rat
Dog
hch
rXpchrX
mch
rrc
hr
cchr
F0
SPIN
2A.
FAA
H2
DF
OF
0S
PIN
2A.
FA.4
H2
DF
OF
0S
px
2a
DF
OF
0S
pix2
aD
FD
F0
SPIN
2A,
FAA
H2
DF
D
Faa
h2F
aah2
ZX
DB
9Z
XD
B-
9L
0C49
191
2-
9
ZX
DA
-9-Z
XD
A-9
.
FO
KR
Y9P
I7.
FO
KR
T8P
I7,
PC
60
83
17
F0
1608
317
FG
KR
T8P
I7,
LO
C65
3568
L0C
6535
89L
0C65
3596
142
Tad
epal
lyet
Page 161
rchrX
F0
CX
rnI4
8D
F0
Hum
anC
k,s
ter
X.3
Ch
!mp
anze
eM
ouse
Rat
Doq
F0
0dx2
6b
hch
rX
F0
CX
044B
.L0C
7284
70
L0C
65
0024
ZN
F75
ZN
F44
9F
0,
00X
265
DF
0
9K5
-
S7*
pch
rX
FGC
X04
48,L
0C72
8470
L0C
6500
24
ZN
F75
ZN
F44
9F
G’0
0X26
B
DF
O
SK5
-
57*
mch
tX
FG
CX
S,6
48D
FC
Zfp
449
S7
*
F0
Ddx
26b
Hu
man
Clu
ster
X.4
hch
rX
F0
AN
MA
GA
,54A
0EA
1
ZN
F27
5L
OC
13
97
35
F0
TR
EX
2,U
CH
L5I
P
DF
0
-11
*
K8,
Ch
imp
aflz
ee
pch
rX
F0
AN
MA
OA
.51A
0E41
DF
C
ZN
F27
5ii
*
L0C
7399
68-
8
FGT
RE
X2,
UC
HC
5IP
Mo
use
mch
rX
F0
An
rga,M
ag
eal
DF
O
Zfp
275
-11
Zfp
92K
9*
F0
T,e*
2,U
chI5
ip
Flat
rchrX
0A
nnga.0
0g1
D
Ztp
275
*0T,
e*2.
Uch
î5ip
F0
11
Dog
Cch
rX
L0C
4922
33
F0.T
RE
X2,
UC
HL
6IP
Hu
man
Clu
ster
Y.1
Ch
fmp
anze
eM
ouse
Rat
Dog
hchrY
pchrY
F0
HT
FF
Y4B
,BP
Y2B
DF
OF
0H
TF
FY
488P
Y2B
DF
O
L0C
392603
--
--
--
L0C
442486
--
--
--
F0.
BP
Y2C
,fl
TY
4C
F0
BPY
2C,
Trr
Y4C
D
Cch
rX
FG
CX
or14
8LO
C72
8470
DF
L0C
6520
21
LO
CB
JI8S
Ose
sL
0C49
2160
sF
000
X26
0
o
DF
O
-19
*
143
Tad
epal
lyet
Page 162
Supplementary Table S4
Compreliensive catalogue of the C2H2-ZNF genes from the 81 human clusters and
their syntenically homologous clusters from other mammalian genomes (Chimpanzee,
Mouse, Rat and Dog).
For the 81 human (h) C2H2-ZNF clusters and their corresponding syntenically homologous
clusters in chirnpanzee (p), mouse (ni), rat (r) and dog (c), this table provides the cluster
number, the position on the chromosome, the flanking genes (FG), the names ofthe C2H2-
ZNF from the cluster (in bold), the domain associated (D) (K KRAB, S = SCAN, S-K=
SCAN-KRAB, B = BTB, H = HOMEO, Se = SET and ,‘cJ ‘ no domain associated), the
number of zinc finger motifs present (F), the orientation (O).
144
Page 163
Chapter 3. DISCUSSION
145
Page 164
Many studies in biology focus on the extensive similarities between the genomes of
human and model organisms, to extract insights into the molecular mechanisms and
aetiology ofhuman diseases. Our investigation ofthe C2H2-ZNF gene family in mammals
reveals that there is an extensive variation ofthe C2H2-ZNF gene content and genomic
organization as well as the domain composition of orthologous genes among species. In
addition, our study is the first to provide a clear demonstration of the important contribution
of gene loss in the evolution of C2H2-ZNf family and to demonstrate the rapid evolution
of C2H2-ZNf genes that occurs between related species, our observations at the genomic
scale provide insights into C2H2-ZNF gene evolution that confirm conclusions drawn from
smaller-scale studies on individual genes, clusters and C2H2-ZNF subfamilies.
The major contributions of our study are:
j. The extensive anaiysis of ail the C2H2-ZNF genes in the human genome.
ii. A comprehensive and systematic anaiysis of ail the human C2H2-ZNF clusters
and the identification of their syntenicaÏiy homoiogous counterparts in other
mammalian genomes.
iii. The distinction of species-specific expansion and Ioss in C2H2-ZNf clusters
and genes in ail mammals.
iv. The identification of variation in the number of zinc finger motifs and the
presence or absence of the conserved N-terminal domains associated with
C2H2-ZNF mammalian orthologs.
146
Page 165
y. The tracing back of different evoiutionary patterns of the C2H2-ZNF gene
family within primates and rodents.
vi. The establishment of a mode! reconstructing the history and evolution of the
SCAN, SCAN-KRAB and KRÀB subfamilies.
In brief, our study reveals that the multiple and independent duplications and !osses of
C2H2-ZNf genes and their effector domains within different lineages and species has
shaped and diversified C2H2-ZNf repertoires in mammals.
3.1 The C2H2-ZNF genes in the human genome
Earlier studies of C2H2-ZNF genes focussed on human chromosome 19 (Eichler,
Hoffman et al. 199$; Dehal, Predki et al. 2001; Looman, Abrink et al. 2002; Shannon,
Hamilton et al. 2003) and KRAB C2H2-ZNf subfamily in human (Huntley, Baggott et al.
2006). In contrast, our study provides a comprehensive and systematic analysis of ail the
C2H2-ZNF genes (Supplementary Table S Ï) in the human genome. We identified and
analyzed the organization of 718 C2H2-ZNf genes in the human genome and classified
them into different subfamilies of C2H2-ZNF (KRAB-C2H2-ZNF, SCAN-C2H2-ZNF,
BTB-C2H2-ZNF and those without a conserved N-terminal domain). We also discovered
two new C2H2-ZNF subfamilies, the HOMEO and SET subfamily which have a limited
number ofmembers (5 and 2, respectively) possibly due to a more recent appearance or to a
different rate of duplication and loss.
147
Page 166
Consistent with previous reports (Rousseau-Merck, Koczan et al. 2002; Huntley,
Baggott et al. 2006), we observed a massive clustering of the C2H2-ZNF on the human
genome. More than 70% of the genes are organized into clusters on the human genome.
However, in addition to the earlier reported clusters on human chromosome 19, (Venter,
Adams et al. 2001; Huntley, Baggott et ai. 2006), we also located a substantial amount of
ciusters (83%) on the other chromosomes ofthe human genome (Supplementary Table S2).
Interestingly, the distribution of C2H2-ZNF genes is positively biased toward chromosome
19, harbouring 40% of ail C2H2-ZNf genes in humans. Most of the human C2H2-ZNF
genes are organized into clusters (500) with more than 60% of these clusters containing
intermixed sets of genes from different subfamilies (Supplementary table S3).
The above observations were only possible through the study of all the C2H2-ZNF
sub-families at the whole genome level.
3.2 Variation in the numbers of C2112-ZNF genes in
mammalian clusters
A systematic and comprehensive analysis of the human C2H2-ZNF clusters and its
syntenic counterparts in the chimpanzee, mouse, rat and dog genomes, allowed us to gain
insights into the evolution of ah the C2H2-ZNF gene subfamilies in mammals
($uppiementaiy Table S4). The criterion to identify homologous clusters in syntenic
regions was based on the flanking genes identified for each human cluster. Interestingly,
this analysis revealed a high variation in the number of C2H2-ZNF genes within
14$
Page 167
syntenically homologous clusters of mammals. Considering primates, humans have 518
C2H2-ZNF forming $1 clusters whereas chimpanzee has only 397 C2H2-ZNF organized
into 79 clusters. This suggests that almost ail the C2H2-ZNf clusters in human have a
syntenic counterpart in chimpanzee. However, humans have 30 % more C2H2-ZNf genes
within the identified clusters than chimpanzee implying that C2H2-ZNF genes are evolving
differently within the primate lineage. A similar pattem was observed within the rodent
lineage, where mouse and rat have 232 and 172 C2H2-ZNf organized into 62 and 58
clusters, respectively. A differential expansion of C2H2-ZNF genes, particularly striking in
primates was evident in mammals (human>chimpanzee>mouse>dog>rat for the number of
genes within clusters and human>chimpanzee>mouse>rat>dog for the number of clusters)
(Figure 2 and Supplementary Table S4). A doser look at the individual syntenic clusters in
mouse, rat and dog indicates many cases where dog has more number of genes than rodents
and more specifically than rat (Supplementary Figure S2).
Our study indicates that C2H2-ZNF genes are indeed rapidly evolving genes as
evident for example within the primate and rodents lineages. The differential expansion
observed in the various species may be accounted both by differential duplication and/or
loss. If the high numbers of C2H2-ZNF genes found in human as compared to chimpanzee
suggest an expansion specific to human through tandem duplication, it seems that this
difference is not solely due to duplications but also involves loss in chimpanzee as seen in
many gene families (Fortna, Kim et aI. 2004). In agreement with this, there are more
pseudogenes in chimpanzee clusters (as annotated in Genbank) than in the corresponding
human clusters. Thus, the variation in the numbers of C2H2-ZNF genes observed within
149
Page 168
primates could be attributed to both gene duplication and loss due to deletion or
pseudogenization, gain being more predominant in human and loss in chimpanzee.
Interestingly, a variation in the number of C2H2-ZNF genes is also evident in rodents.
However, in this case in almost ail the clusters, mouse has either a higher or equal number
of C2H2-ZNF as compared to rat. Altogether, our study suggests that in addition to lineage
or species-specific increase in the numbers ofthe C2H2-ZNf genes, loss ofthese genes has
also played a very important role in the evolution of this gene family. The evaluation of the
relative contribution of gene duplication and loss requires detailed phylogenetic studies.
3.3 Evolution of C2112-ZNF genes in mammals through
differential expansion and loss
Phylogenetic analyses of human C2H2-ZNF clusters with their syntenic
counterparts from other mammals provided a better estimation of the relative contribution
of gene duplication and loss in the analyzed clusters.
For example, the phylogenetic analysis of the C2H2-ZNF genes from human cluster
Ï 9.12 and its syntenically homologous clusters in mammals combined with the physical
maps of these clusters gives us insights into the gene rearrangement mechanisms that could
have taken place during evolution. Consistent with previous individual reports of lineage
specific expansion, more specifically of KRAB C2H2-ZNF genes (Dehal, Predki et al.
2001; Shannon, Hamilton et al. 2003; Huntley, Baggott et al. 2006), a primate lineage
specific expansion but also a dog specific expansion and a mouse duplication of C2H2-
150
Page 169
ZNF genes were clearly identified . In ail species, tandem dupiication was found
responsibie for the species-specific increase in the number of C2H2-ZNF genes as
confirmed by the fact that the genes that group together in the tree are almost aiways
physicaiïy clustered together in the cluster on the chromosome. Furthermore, the
orientations of the genes belonging to the same ciade in the phylogenetic tree and their
orthoiogs were almost aiways the same. In a few instances, however, the orientations of the
genes belonging to the same phyiogenetic clade were different and genes within syntenic
clusters were inverted in a few instances, revealing a lot ofpossibie gene rearrangements.
Clear evidence of gene loss was aiso obtained by analyzing cluster 19.12 and other clusters.
Considering that rodents are evolutionariiy more related to primates than dog, an absence in
rodents of C2H2-ZNF ciusters or genes, that are present in primates and dog, suggests a
loss in rodents. Severai examples of genes loss in rodents were obtained. The phylogenetic
analysis also indicated loss of genes in chimpanzee by pseudogenization as suggested
above.
Aitogether our studies indicate a predominant role of gene gain by tandem duplication over
gene loss for the evolution of C2H2-ZNF genes in mammals. It should however be pointed
out that more definitive conclusions about the pseudogene status of the various C2H2-ZNF
genes and on the role of these genes requires detaiied functional investigation of the
individual genes.
151
Page 170
3.4 Evolutïon of the C2H2-ZNF genes tlirougli duplication or
loss of zinc linger and N-terminal effector motifs
In accordance with a previous study on the average number of zinc finger motifs
from a few plant (1), yeast(1.5), nematode (2.5), insect (3.5) and human (8) C2H2-ZNF
(Venter, Adams et al. 2001), an in depth analysis of the zinc finger motifs associated with
the C2H2-ZNf genes found in clusters in human and their syntenic genes in chimpanzee,
mouse, rat and dog indicated that there is a significant variation in the number of zinc
finger motifs associated with C2H2-ZNF genes in these mammalian species. Noticeably,
the C2H2-ZNF genes from dog were found to encode a higher average of zinc finger motifs
as compared to the other mammals studied. It is possible that an increase in the number of
fingers within genes could confer advantageous additional functionality to the C2H2-ZNf
genes through a diversification ofthe possible nucleic acid and protein interactions.
We also observed a variation in the presence of N-terminal effector motifs, such as
SCAN or KRAB among orthologs, accounted by either gain or loss of these motifs.
However, loss of N-terminal effector domains by sequence degeneration was confirmed in
several cases in our study. A thorough analysis of the exon-intron structure of C2H2-ZNF
genes indicated a typical conserved exon-intron organization for C2H2-ZNF genes
associated with a SCAN-KRAB. Based on these observations, we propose a model of
evolution of C2H2-ZNF sub-families involving independent gain events of a SCAN and
KRAB domain each by an exon-shuffling mechanism and subsequent gene duplications
and loss of effector motifs by deletion or degeneration of the SCAN and/or KRAB domain.
152
Page 171
3.5 Bïrth ami Deatli model of evolution
It was suggested from a study of a few chromosome 19 C2H2-ZNF clusters that
C2H2-ZNF genes evolve by positive selection (Schmidt and Durrett 2004). Our resuits and
analyses of the human C2H2-ZNF clusters and their syntenic counterparts in other
mammals suggests a “Birth and Death” model of evolution similar to that proposed by Nei
and coli. (Nei, Gu et al. 1997; Nei 2000) (See Figure 7B). According to this mode!, new
genes are created by duplication inc!uding tandem duplication and b!ock gene duplication
(birth). While some ofthem might acquire a new function and thus diverge functiona!ly, the
others may remain relatively unchanged in the genome for a long time. Again others
become pseudogenes following deleterious mutations or get de!eted from the genome
(death through inactivation or elimination). Though functiona! information is known for
on!y a handful of C2H2-ZNF proteins, the variations in the numbers C2H2-ZNF genes that
we found throughout the evolution of mamma!s and our phy!ogenetic analysis points to
duplication and loss as a guiding force in the evolution ofthese genes.
153
Page 172
Thue
Ancestralspccics
Figure 7: Birth-and-death model of evolution.
The figure shows the two models associated with the evolution of multigene farnilies.
Open circles represent functional genes and closed circles represent pseudogenes.
(A) In concerted evolution, related genes belonging to the ancestral species evolve in a
concerted manner rather than independently in both Species 1 and Species 2.
(B) Birth and Death Mode! of evolution, where the genes evolve differently by duplication,
few are maintained in the genome for longer, while the others are deleted or become
pseudogenes.
Speces I Specics 2 Species E Species 2
fA) Ccrncerted evtiIutkr (13) Biith-and-death mx1e1of evolutior
154
Page 173
3.6 C2H2-ZNF gene family: An analogy with the Olfactory Receptor
gene family
The olfactory receptor genes constitute the largest mammalian gene family with
more than 1000 members in human. However, 60% of these genes are pseudogenes. In
contrast to this, the olfactory receptor gene family in mouse comprises of roughly the same
number of genes as human, though the number of pseudogenes is only 20% (Glusman,
Yanai et al. 2001; Niimura and Nei 2003; Niimura and Nei 2005). Comparative analyses of
this gene family in human, mouse and non-human primates have revealed that differential
expansion and loss have guided the evolution of this gene family (Sharon, Glusman et al.
1999; Lapidot, Pupe! et al. 2001). However, human counterparts have accumulated a lot of
mutations, leading to the numerous pseudogenes in comparison to mouse or any other non
human primate. This is associated with the reduced chemosensory capacity in humans.
The olfactory receptor and the C2H2-ZNF gene families show similar pattems of
evolution. Differential gene expansion and loss have played an important role in the
evolution of both gene families in mammals. However, in contrast to the olfactory receptor
genes, C2H2-ZNF genes apparently do not accumulate pseudogenes in humans irrespective
of their large number. Studies on human C2H2-ZNF clusters have indicated that these
genes are rapidly evolving through positive selection and may acquire new functions after
duplication (Schmidt and Durrett 2004; Huntley, Baggott et al. 2006).
By making a correlation between the number of olfactory genes in human and
mouse and their functions in the respective organisms, we do understand that a reduced
chemosensory dependence in primates and non-human primates as compared to mouse can
155
Page 174
be responsible for the large number of pseudogenes in human. Presently, the lack of large
scale analysis of the expression profile and ftmction of C2H2-ZNF genes preclude the
establishment of such type of conclusions. The extremely high number of hurnan C2H2-
ZNf genes, the species-specific expansions and loss in mammals leading to differential
evolution within primates and rodents, when put in perspective with functional information
ofthese proteins could give us interesting insights into the evolution ofthis gene family.
3.7 A few concerns to the study
We must acknowledge here three possible sources ofbias in our study.
First, errors in reporting the number of genes due to improper sequencing and
annotation in the available databases could be a primary source of bias to our study.
However, we did only consider genomes like human, chimpanzee, mouse, rat and dog
which are >94% complete to minimize significantly this source ofbias.
A second concern is that the loss of C2H2-ZNf clusters or genes that we see among
the species could be due to the fact that the genes were dispersed onto different
chromosomes due to translocation. Though we do accept this as a possibility, we conduct
an extensive analysis to rule out this possibility for the clusters we smdied in depth. For
example, in group I of the phylogenetic tree (Figure 5; Article) of the human cluster 19.12
and its syntenic clusters in chimpanzee, mouse, rat and dog, we observe three orthologs
(hZNF331, pZNF33Y and cZNF331) from human, chimpanzee and dog. There is no rodent
counterpart for these genes, which suggests a Ioss in rodents. To rule out the possibility that
these genes were dispersed onto other regions of the genome by transiocation, we conduct
156
Page 175
an extensive homology TBLASTN search of the mouse and rat genomes using each of the
three orthoÏogs from human, chirnpanzee and dog as a query. for the three queries, the top
most blast hit was Zfpl4 from mouse and L0C97 124 from rat. We included these two
sequences into the dataset used for the phylogenetic analysis (see Methods; Article) of
cluster 19.12 in human and its syntenic counterparts in chimpanzee, mouse, rat and dog.
The phylogenetic tree revealed that the two mouse and rat sequences group with the three
orthoÏogs from human, chimpanzee and dog in group I. However, a doser look at the
sequence similarity between these sequences (< 60%) suggests that they cannot be
orthologs and the grouping we see could possibly be because ofthe fact that they were the
closest to the query sequences used
The third and final concem is that considering the extremely large numbers of
C2H2-ZNF in the human genome, we cannot rule out the possibility of pseudogenes.
Though we do not conduct an extensive search to look for possible pseudogenes, an
analysis of the open reading frames of the C2H2-ZNF genes considered in this study with
their translated sequences suggest that most of them are rnost likely flot pseudogenes. A
distribution curve (Figure 8) of the amino acid sequence length of the various C2H2-ZNF
genes shows that almost ail of the sequences have large open reading frames potentially
translated and functional.
157
Page 176
160
—e-— Noof C2H2-ZNF
Figure 8: Plot ofthe amino acid sequence iengths of ail the C2H2-ZNF in the human
genome
140
120
100
80-
60
40
20
o - -- — - -- ——
Length of the C2H2-ZNF amino acid sequence
158
Page 177
3.8 Merits of the study
Our study provides a comprehensive insight into the evolution of C2H2-ZNF
throughout several mammalian genomes. To summarize, the merits of our study are as
follows.
• A good range of species, with completely sequenced genomes was considered to
analyze the evolution of C2H2-ZNF genes in mammals.
• A stringent phylogenetic analysis of the syntenic clusters in human, chimpanzee,
mouse, rat and dog was performed using both maximum likelihood (RAxML)
and bayesian (Mr.Bayes) methods. Noticeably, unlike other studies on C2H2-
ZNF which use Xfin as an outgroup (Looman, Abrink et al. 2002; Shannon,
Hamilton et al. 2003; Huntley, Baggott et al. 2006), we conduct an extensive
search to include chicken (Gallus gallus) homologs in addition to Xfin as an
outgroup. The chicken sequences are considerably doser as an outgroup to the
species studied (human, chimpanzee, mouse, rat and dog).
The phylogenetic relationships we observed between C2H2-ZNF genes in the
syntenic clusters from the different species was found consistent with the overail
picture of the number of genes in the species and with the physical maps of the
clusters.
• A model for the evolutionary relationship of SCAN, SCAN-KRAB and KRAB
C2H2-ZNF subfamilies is proposed providing a possible explanation for
previously unresolved questions in the field.
159
Page 178
3.9 Perspectives
The following studies could be done as a future approacli. to what is already
known.
• The compilation of comprehensive catalogues of the C2H2-ZNF gene
repertoires in chimpanzee, mouse, rat and dog. The detailed comparison of
the organization and numbers of C2H2-ZNF in complete repertoires.
• The stringent phylogenetic analysis of these repertoires to gain insights into
the various detailed mechanisms, which have taken place during the
evolution ofthis gene family in mammals.
• The more detailed analysis of the physical mapping of genes within clusters
to gain insight into the molecular mechanisms involved in the expansion of
these genes. This could include the analysis of the possible repeated
sequences that are bordering these C2H2-ZNF and that may be involved in
this phenomenon, the analysis of the orientation, distances between genes
and exon-intron organisation.
• The more comprehensive study of pseudogenes
Clearly, more detailed bio-informatics and functional studies are stiil required for a better
understanding of the driving force behind the expansion of C2H2-ZNF genes in mammals.
160
Page 179
REFERENCES
Bellefroid, E. J., P. J. Lecocq, et al. (1989). “The human genorne contains hundreds of
genes coding for finger proteins ofthe Kruppel type.” DNA 8(6): 377-87.
Bellefroid, E. J., J. C. Marine, et al. (1993). “Clustered organization ofhomologous KRAB
zinc-finger genes with enhanced expression in human T lymphoid celis.” Embo J
12(4): 1363-74.
Bellefroid, E. J., D. A. Poncelet, et al. (1991). “The evolutionarily conserved Kruppel
associated box domain defines a subfarnily of eukaryotic multifingered proteins.”
Proc Nati Acad Sci U $ A 88(9): 3 608-12.
Benn, A., M. Antoine, et aÏ. (1991). “Primary structure and expression ofa chicken cDNA
encoding a protein with zinc-finger motifs.” Gene 106(2): 207-12.
Bertrand, D. and O. Gascuel (2005). “Topological rearrangements and local search method
for tandem duplication trees.” IEEE/ACM Trans Comput Biol Bioinform 2(1): 15-
28.
Birtie, Z. and C. P. Ponting (2006). “Meisetz and the birth ofthe KRÀB motif.”
Bioinformatics 22(23): 2841-5.
Bouhouche, N., M. $yvanen, et al. (2000). “The origin ofprokaryotic C2H2 zinc finger
regulators.” Trends Microbiol 8(2): 77-8 Ï.
Castresana, J. (2000). “Selection of consewed blocks from multiple alignments for their use
in phylogenetic analysis.” Mol Biol Evol 17(4): 540-52.
Chung, H. R., U. Schafer, et al. (2002). “Genomic expansion and clustering of ZAD
containing C2H2 zinc-finger genes in Drosophila.” EMBO Rep 3(12): 1158-62.
Collins, T., J. R. Stone, et aI. (2001). “AIl in the family: the BTBIPOZ, KRAB, and SCAN
domains.” Mol Cell Biol 21(11): 3609-15.
Darwin, C. (1837). The First Notebook on Transmutation of Species.
DehaÏ, P., P. Predki, et aÏ. (2001). “Hurnan chromosome 19 and related regions in mouse:
conservative andiineage-specific evolution.” Science 293(5527): 104-11.
161
Page 180
Dernuth, J. P., T. D. Bie, et al. (2006). “The evolution of Mammalian gene families.”
PLoS ONE 1: e85.
Edeistein, L. C. and T. Collins (2005). ‘The SCAN domain family of zinc finger
transcription factors.” Gene 359: 1-17.
Edgar, R. C. (2004). “MUSCLE: multiple sequence alignment with high accuracy and high
throughput.” Nucleic Acids Res 32(5): 1792-7.
Eichler, E. E., S. M. Hoffman, et al. (1998). “Complex beta-satellite repeat stmctures and
the expansion of the zinc finger gene cluster in l9p12.” Genome Res 8(8): 791-80$.
Elernento, O. and O. Gascuel (2002). “An efficient and accurate distance based algorithm to
reconstruct tandem duplication trees.” Bioinformatics 18 Suppi 2: S92-9.
EÏemento, O., O. Gascuel, et al. (2002). “Reconstructing the duplication history of
tandemly repeated genes.” Mol Biol Evol 19(3): 27$-8$.
fitch, W. M. (2000). “Homology a personal view on some ofthe problems.” Trends Genet
16(5): 227-3 1.
Fortua, A., Y. Kim, et aÏ. (2004). “Lineage-specific gene duplication and loss in human and
great ape evolution.” PLoS Biol 2(7): E207.
Francis Darwin, A. C. S. (1903). More letters of Charles Darwin: A record ofhis work in a
series ofhitherto unpublished letters.
Frankel, A. D., J. M. Berg, et al. (1987). “Metal-dependent folding ofa single zinc finger
from transcription factor IIIA.u Proc Nati Acad Sci U S A 84(14): 4841-5.
Friedman, J. R., W. J. Fredericks, et al. (1996). “KAP-l, a novel corepressor for the highly
conserved KRÀB repression domain.” Genes Dey 10(16): 2067-78.
Fuchs, T., G. Glusman, et al. (2001). “The human olfactory subgenome: from sequence to
structure and evolution.” Hum Genet 102(1): 1-13.
Germain-Desprez, D., M. Bazinet, et al. (2003). “Oligomerization oftranscriptional
intermediary factor 1 regulators and interaction with ZNf74 nuclear matrix protein
revealed by bioÏuminescence resonance energy transfer in living celis.” J Biol Chem
278(25): 22367-73.
162
Page 181
Gertz, E. M., Y. K. Yu, et al. (2006). “Composition-based statistics and translated
nucleotide searches: improving the TBLASTN module ofBLAST.” BMC Biol 4:
41.
Gilad, Y., O. Man, et al. (2005). ‘A comparison of the human and chimpanzee olfactory
receptor gene repertoires.” Genome Res 15(2): 224-30.
Glusman, G., A. Bahar, et al. (2000). “The olfactory receptor gene superfamily: data
mining, classification, and nomenclature.” Mamrn Genome 11(1 1): 1016-23.
Glusman, G., I. Yanai, et al. (2001). “The complete human olfactory subgenome.” Genome
Res 11(5): 685-702.
Grondin, B., M. Bazinet, et al. (1996). “The KRAB zinc finger gene ZNF74 encodes an
RNA-binding protein tightly associated with the nuclear matrix.” J Biol Chem
271(26): 15458-67.
Hamilton, A. T., S. Huntley, et al. (2003). “Lineage-specific expansion ofKRAB zinc
finger transcription factor genes: implications for the evolution of vertebrate
regulatory networks.” Cold Spring Harb Symp Quant Biol 6$: 13 1-40.
Hamilton, A. T., S. Huntley, et al. (2006). “Evolutionary expansion and divergence in the
ZNf9Y subfamily ofprimate-specific zinc finger genes.” Genome Res 16(5): 584-
94.
Huelsenbeck, J. P. and F. Ronquist (2001). “MRBAYES: Bayesian inference of
phylogenetic trees.” Bioinformatics 17(8): 754-5.
Huntley, S., D. M. Baggott, et al. (2006). “A comprehensive catalog ofhuman KRAB
associated zinc finger genes: insights into the evolutionary history of a large family
of transcriptional repressors.” Genome Res 16(5): 669-77.
Kim, S. S., Y. M. Chen, et al. (1996). “A novel member of the RTNG finger family, KRTP
1, associates with the KRAB-A transcriptional repressor domain of zinc finger
proteins.” Proc NatÏ Acad Sci U S A 93(26): 15299-304.
Klug, A. and D. Rhodes (1987). “Zinc fingers: a novel protein fold for nucleic acid
recognition.” Cold Spring Harb Symp Quant Biol 52: 473-82.
163
Page 182
Krebs, C. J., L. K. Larkins, et aÏ. (2003). “Regulator ofsex-limitation (Rsl) encodes a pair
ofKRAB zinc-finger genes that control sexually dimorphic liver gene expression.”
Genes Dey 17(2 1): 2664-74.
Krishna, S. S., I. Majumdar, et al. (2003). “Structural classification of zinc fingers: survey
and summary.” Nucleic Acids Res 31(2): 532-50.
Lander, E. S., L. M. Linton, et al. (2001). “Initial sequencing and analysis ofthe human
genome.” Nature 409(6822): 860-921.
Lapidot, M., Y. Pilpel, et al. (2001). “Mouse-human orthology relationships in an olfactory
receptor gene cluster.” Genomics 71(3): 296-306.
Lee, M. S., G. P. Gippert, et al. (1989). “Three-dimensional solution structure ofa single
zinc finger DNA-binding domain.” Science 245(4918): 635-7.
Li, W., L. Jaroszewski, et al. (2001). “Clustering ofhighly homologous sequences to reduce
the size of large protein databases.” Bioinformatics 17(3): 282-3.
Looman, C., M. Abrink, et al. (2002). “KRAB zinc finger proteins: an analysis ofthe
molecular rnechanisrns governing their increase in numbers and complexity during
evolution.” Mol Biol Evol 19(12): 2118-30.
Looman, C., L. Hellman, et al. (2004). “A novel Kruppel-Associated Box identified in a
panel ofmamrnalian zinc finger proteins.” Mamm Genome 15(1): 35-40.
Margolin, J. f., J. R. Friedman, et al. (1994). “Kruppel-associated boxes are potent
transcriptional repression domains.” Proc Natl Acad Sci U S A 91(10): 4509-13.
Mark, C., M. Abrink, et al. (1999). ‘Comparative analysis ofKRAB zinc finger proteins in
rodents and man: evidence for several evolutionarily distinct subfamilies of KRAB
zinc finger genes.” DNA Ceil Biol 18(5): 381-96.
Melnick, A., G. Carlile, et al. (2002). “Critical residues within the BTB domain ofPLZF
and 3d-6 modulate interaction with corepressors.” Mol Celi Biol 22(6): 1804-18.
Messina, D. N., J. Glasscock, et al. (2004). “An ORFeome-based analysis ofhuman
transcription factor genes and the construction of a microarray to interrogate their
expression.” Genorne Res 14(1OB): 204 1-7.
164
Page 183
Miller, J., A. D. McLachlan, et al. (1985). “Repetitive zinc-binding domains in the protein
transcription factor TuA from Xenopus oocytes.” Embo J 4(6): 1609-14.
Moosmann, P., O. Georgiev, et al. (1996). “Transcriptional repression by RING finger
protein TIF1 beta that interacts with the KRAB repressor domain of KOXI.’
Nucleic Acids Res 24(24): 4859-67.
Moreira, D. and F. Rodriguez-Valera (2000). “A mitochondrial origin for eukaiyotic C2H2
zinc finger regulators?” Trends Microbiol 8(10): 448-50.
Nei, M., and Kumar S. (2000). Molecular Evolution and Phylogenetics, New York: Oxford
University Press.
Nei, M., X. Gu, et al. (1997). “Evolution by the birth-and-death process in multigene
families ofthe vertebrate immune system.” Proc Nati Acad Sci U S A 94(15): 7799-
806.
Niimura, Y. and M. Nei (2003). “Evolution of olfactory receptor genes in the human
genome.” Proc Nati Acad Sci U S A 100(21): 12235-40.
Niimura, Y. and M. Nei (2005). “Comparative evolutionary analysis of olfactoiy receptor
gene clusters between humans and mice.” Gene 346: 13-2 1.
Ohta, T. (2000). “Evolution of gene families.” Gene 259(1-2): 45-52.
Omichinski, J. G., G. M. Clore, et al. (1990). High-resolution three-dimensional structure
of a single zinc finger from a human enhancer binding protein in solution.
Biochemistry. 29: 9324-34.
Omichinski, J. G., G. M. Clore, et al. (1992). “High-resolution solution structure ofthe
double Cys2His2 zinc finger from the human enhancer binding protein MBP-1
Biochemistry 31(16): 3907-17.
Pabo, C. O. and R. T. $auer (1992). Transcription factors: structural families and principles
ofDNA recognition. Annu Rev Biochem. 61: 1053-95.
Page, R. D. and M. A. Charleston (1997). “From gene to organismal phylogeny: reconciled
trees and the gene tree/species tree problem.” Mol Phylogenet Evol 7(2): 231-40.
Parraga, G., S. J. Horvath, et al. (1988). “Zinc-dependent structure ofa single-finger
domain ofyeast ADR1.” Science 241(4872): 1489-92.
165
Page 184
Pengue, G., V. Calabro, et al. (1994). “Repression oftranscriptional activity at a distance
by the evolutionarily conserved KRAB domain present in a subfamily of zinc finger
proteins.” Nucleic Acids Res 22(15): 2908-14.
Pengue, G. and L. Lania (1996). “Kruppel-associated box-mediated repression ofRNA
polymerase II promoters is influenced by the arrangement of basal promoter
elements.” Proc Nati Acad Sci U S A 93(3): 1015-20.
Quignon, P., E. Kirkness, et al. (2003). “Comparison ofthe canine and human olfactory
receptor gene repertoires.” Genome Biol 4(12): R80.
Rhodes, D. and A. Klug (1993). “Zinc fingers.” Sci Am 268(2): 56-9, 62-5.
Roberts, S. G. (2000). “Mechanisms of action of transcription activation and repression
domains.” Celi Mol Life Sci 57(8-9): 1149-60.
Rosati, M., M. Marino, et al. (1991). “Members of the zinc finger protein gene family
sharing a conserved N-terminal module.’ Nucleic Acids Res 19(20): 566 1-7.
Rousseau-Merck, M. F., D. Koczan, et al. (2002). “The KOX zinc finger genes: genome
wide mapping of 368 ZNF PAC clones with zinc finger gene clusters predominantly
in 23 chromosomal loci are confirmed by human sequences annotated in
EnsEMBL.” Cytogenet Genome Res 98(2-3): 147-53.
Ruiz j Altaba, A., H. Peny-O’Keefe, et al. (1987). “Xfin: an embryonic gene encoding a
multifingered protein in Xenopus.” Embo J 6(10): 3065-70.
Sander, T. L., A. L. Haas, et al. (2000). “Identification ofa novel SCAN box-related protein
that interacts with MZF lB. The leucine-rich SCAN box mediates hetero- and
homoprotein associations.” J Biol Chem 275(17): 12857-67.
Schmidt, D. and R. Durrett (2004). “Adaptive evolution drives the diversification of zinc
finger binding domains.” Mol Biol Evol 21(12): 2326-39.
Schuh, R., W. Aicher, et al. (1986). “A conserved family ofnuclear proteins containing
structural elements ofthe finger protein encoded by Kruppel, a Drosophila
segmentation gene.” Ceil 47(6): 1025-32.
Schumacher, C., H. Wang, et al. (2000). “The SCAN domain mediates selective
oligomerization.” J Biol Chem 275(22): 17 173-9.
166
Page 185
Shannon, M., A. T. Hamilton, et al. (2003). 11Differential expansion 0f zinc-finger
transcription factor loci in homologous human and mouse gene clusters.” Genome
Res 13(6A): 1097-110.
Shannon, M., J. Kim, et al. (199$). “Tandem zinc-finger gene farnilies in mammals:
insights and unanswered questions.” DNA Seg 8(5): 303-15.
Sharon, D., G. Glusman, et al. (1999). “Primate evolution of an olfactory receptor cluster:
diversification by gene conversion and recent emergence ofpseudogenes.”
Genomics 61(1): 24-36.
Sitnikova, T. and C. Su (199$). “Coevolution of immunoglobulin heavy- and light-chain
variable-region gene families.” Mol Biol Evol 15(6): 6 17-25.
Stamatakis, A., T. Ludwig, et al. (2005). “RÀxML-III: a fast program for maximum
likelihood-based inference of large phylogenetic trees.” Bioinformatics 21(4): 456-
63.
Stone, J. R., J. L. Maki, et al. (2002). “The SCAN domain of ZNF 174 is a dimer.” J Biol
Chem 277(7): 5448-52.
Tang, M., M. Waterman, et al. (2002). “Zinc finger gene clusters and tandem gene
duplication.” J Comput Biol 9(2): 429-46.
Theunissen, O., F. Rudt, et al. (1992). “RNA and DNA binding zinc fingers in Xenopus
TFIIIA.” CeIl 71(4): 679-90.
Thomton, J. W. and R. DeSalle (2000). “Gene family evolution and homology: genomics
meets phylogenetics.” Annu Rev Genomics Hum Genet 1: 41-73.
Urrutia, R. (2003). “KRAB-containing zinc-finger repressor proteins.” Genome Biol 4(10):
231.
Venter, J. C., M. D. Adams, et al. (2001). “The sequence of the human genome.” Science
291(5507): 1304-51.
Waterston, R. H., K. Lindblad-Toh, et al. (2002). “Initial sequencing and comparative
analysis ofthe mouse genome.” Nature 420(69 15): 520-62.
167
Page 186
Williams, A. J., L. M. Khachigian, et al. (1995). ‘Isolation and characterization ofa novel
zinc-finger protein with transcription repressor activity.” J Biol Chem 270(3 8):
22143-52.
Witzgall, R., E. O’Leary, et al. (1994). “The Kruppel-associated box-A (KRAB-A) domain
of zinc finger proteins mediates transcriptional repression.” Proc Nati Acad Sci U S
A 91(10): 4514-8.
Wolfe, S. A., L. Nekiudova, et al. (2000). “DNA recognition by Cys2His2 zinc finger
proteins.” Annu Rev Biophys Biomol $truct 29: 183-2 12.
Zheng, L., H. Pan, et al. (2000). “Sequence-specific transcriptional corepressor function for
BRCA1 through a novel zinc finger protein, ZBRK1 .“ Mol CelI 6(4): 75 7-68.
168