Top Banner
Université de Montréal ‘Evo1ution ofC2H2-Zinc finger genes in mammalian genomes” par “Hamsa Dhwani Tadepally” “Département de Biochimie” “Faculté de Médecine” Thèse présentée à la Faculté des études supérieures en vue de l’obtention du grade de Maitrise En Biochimie “July 2007” © “Hamsa Dhwani Tadepally” ,2007 2[i7 L cl
186

Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Apr 28, 2023

Download

Documents

Khang Minh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Université de Montréal

‘Evo1ution ofC2H2-Zinc finger genes in mammalian genomes”

par

“Hamsa Dhwani Tadepally”

“Département de Biochimie”

“Faculté de Médecine”

Thèse présentée à la Faculté des études supérieures

en vue de l’obtention du grade de Maitrise

En Biochimie

“July 2007”

© “Hamsa Dhwani Tadepally” ,2007

2[i7 Cî L

cl

Page 2: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

w

I IjJt O

O

Page 3: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Universitéde Montréal

Direction des bibliothèques

AVIS

L’auteur a autorisé l’Université de Montréal à reproduire et diffuser, en totalitéou en partie, par quelque moyen que ce soit et sur quelque support que cesoit, et exclusivement à des fins non lucratives d’enseignement et derecherche, des copies de ce mémoire ou de cette thèse.

L’auteur et les coauteurs le cas échéant conservent la propriété du droitd’auteur et des droits moraux qui protègent ce document. Ni la thèse ou lemémoire, ni des extraits substantiels de ce document, ne doivent êtreimprimés ou autrement reproduits sans l’autorisation de l’auteur.

Afin de se conformer à la Loi canadienne sut la protection desrenseignements personnels, quelques formulaires secondaires, coordonnéesou signatures intégrées au texte ont pu être enlevés de ce document. Bienque cela ait pu affecter la pagination, il n’y a aucun contenu manquant.

NOTICE

The author of this thesis or dissertation has granted a nonexclusive ticenseallowing Université de Montréal to reproduce and publish the document, inpart or in whole, and in any format, solely for noncommercial educational andresearch purposes.

The author and co-authors if applicable retain copyright ownership end moralrights in this document. Neithet the whole thesis or dissertation, flotsubstantial extracts from it, may be printed or otherwise reproduced withoutthe author’s permission.

In compliance with the Canadian Privacy Act some supporting forms, contactinformation or signatures may have been removed from the document. Whilethis may affect the document page count, it does flot tepresent any loss ofcontent from the document.

Page 4: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Université de Montréal

faculté des études supérieures

Cette thèse intitulée

“Evolution of C2H2-Zinc finger genes in mammalian genomes”

Présentée par:

“Hamsa Dhwani Tadepally”

a été évaluée par un jury composé des personnes suivantes:

“Martine Raymond”

Président-rapporteur

“Muriel Aubry”

Directrice de recherche

“Gertraud Burger”

Co-directrice

“Nicolas Lartillot”

Membre dujmy11

Page 5: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Résumé

Les gènes de doigt de zinc de C2H2/Kruppel (C2H2-ZNF) encodent la plus grande classe

des facteurs de transcription chez Phomme. Ces gènes constituent une des plus grandes

familles de gène chez les mammifères et sont souvent trouvés sous forme de regroupements

de gènes juxtaposés sur les chromosomes. Par une recherche extensive basée sur des

similitudes de séquences visant à d’identifier l’ensemble des gènes C2H2-ZNF du génome

humain, nous avons assemblé un répertoire complet de 718 gènes C2H2-ZNf humains. Les

gènes C2H2-ZNF ont été classifiés en sous-familles en fonction des domaines effecteurs N-

terminaux aux quels ils sont associés. Nous avons constaté que la sous-famille encodant un

domaine KRAB comprend 45% de tous les gènes C2H2-ZNF et est par conséquent fa

plus grande sous-famille de gènes à motifs doigt de zinc. De plus, nous avons identifié 81

regroupements de gènes C2H2-ZNf qui correspondent à 70% de tous les gènes C2H2-

ZNf. Presque 90% des gènes C2H2-ZNF appartenant aux sous-familles KRAB et SCAN

sont trouvés sous forme de regroupements. Pour mieux comprendre l’évolution des gènes

C2H2-ZNF, nous avons par la sUite assemblé un répertoire complet de tous les

regroupements de gènes C2H2-ZNF humains ainsi que de leurs contre-parties dans les

régions synténiques des génomes de chimpanzé, de souris, de rat et de chien. Une analyse

systématique de ce répertoire chez ces mammifères a révélé qu’il existe une variation dans

le nombre de regroupements et de gènes faisant partie de ces regroupements parmi les

primates, les rongeurs et les canins. Cette variation suggère que ces gènes ont évolué de

façon différentielle chez les mammifères. Des études phylogénétiques de plusieurs

regroupements de gènes C2H2-ZNf choisis indiquent qu’outre une duplication‘J’

Page 6: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

différentielle, la perte de gènes dans certaines espèces a condujt à des répertoire différents

de gènes C2H2-ZNF chez les mammifères. En plus des variations spécifiques aux espèces

dans le nombre de gènes, nous avons également mis en évidence une variation chez des

orthologues dans le nombre de motifs de doigt de zinc et la présence de domaines

effecteurs, ces derniers étant souvent perdus par dégénération. En conclusion, sur la base

principale de ces résultats et de l’étude de la structure exon-intron des gènes C2H2-ZNF,

nous proposons un nouveau modèle pour lévolution de leurs sous-familles selon lequel les

sous-familles les plus anciennes seraient dans l’ordre SCAN> SCAN-KRAB > KRAB.

iv

Page 7: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Abstract

The C2H2/Kruppel zinc finger genes (C2H2-ZNF) encode the largest ciass of transcription

factors in hurnans. These genes constitute one of the largest gene families in mammals and

are often found in ciusters. Using an extensive similarity search on the hurnan genorne to

identify ail C2H2-ZNF genes, we assembled a comprehensive repertoire of 718 human

C2H2-ZNF genes. The genes were grouped into subfamilies based on the N-terminal

effector domains they were associated with. We found that the KRAB-domain encoding

subfarnily constitutes 45% of the total C2H2-ZNF genes and hence is the largest

subfamiiy of zinc finger genes. In addition to this, we also identified 8 1 C2H2-ZNF clusters

which constitute 70% of the total genes. Almost 90% of the C2H2-ZNF belonging to the

KRAB and SCAN subfamilies were found in ciusters. We then assembled a comprehensive

repertoire of ail the hurnan C2H2-ZNF clusters and their syntenic counterparts in

chimpanzee, mouse, rat and dog genomes. A systernatic analysis of ah the syntenic clusters

reveaÏed a variation in the numbers of clusters and the genes within clusters among

primates, rodents and canines indicating differential pattems of evolution in mammals.

Evolutionary analysis of few selected C2H2-ZNf syntenic clusters in the five mammals

studied suggested that not only differential duplication, but also gene ioss has led to

different repertoires in mammahian genomes. In addition to lineage- and species-specific

variation in the number of genes, we aiso find a variation among orthologs in the number of

zinc finger motifs and in the presence of the effector domains, the later being often lost by

sequence degeneration. finally, based on the above resuits and on the analysis of the exon

intron structure of the various C2H2-ZNF genes, we propose a model for the evolution ofy

Page 8: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

their subfarnilies suggesting that the more ancient subfarnilies are in sequential order

SCAN> SCAN-KRAB > KRAB.

Keywords: C2H2/Kruppel, zinc finger, gene farnily, tandem repeats, gene duplication,

gene loss, evolution.

vi

Page 9: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

List of abbreviations

DNA: Deoxyribonucleic Acid

RNA: Ribonucleic Acid

BIB: Broad-Cornplex, Tramtrack and Bric-a-bric

POZ: Pox virus and Zinc finger

KRAB: Kmppel Associated Box

SCAN: SRE-ZBP, CTfin5l, AW-l andNumberl8 cDNA

KRI motif: KRAB Interior motif

1g: Immunoglobulin

ZNF45: Zinc finger 45 (protein or gene)

ZNF91: Zinc finger 91 (protein or gene)

BLAST: Basic Local Alignment Search Tool

MUSCLE: Multiple Sequence Comparison by Log-Expectation

OR: Olfactory Receptor

VH and VL domains: Heavy & Light chains ofthe Variable domain oflmmunoglobulin

molecule

KRAB C2H2-ZNF: C2H2-Zinc finger proteins associated with a KRAB domain

SCAN C2H2-ZNF: C2H2-Zinc finger proteins associated with a SCAN dornain

BIB C2H2-ZNF: C2H2-Zinc finger proteins associated with a BTB domain

KAP-1: KRAB associated protein 1

TIF1fl: Transcription Intermediaiy Factor I 3

xiv

Page 10: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

List of definitions

Homology: This is a concept that signifies common ancestly.

Orthologs: Genes in different species, which are similar to each other and originated from

a common ancestor, regardless oftheir functions through a speciation event.

Paralogs: Genes that are derived from a duplication event, in the sarne species or different

species. They may or may not have the same function.

Gene duplication: Duplication ofa region ofDNA that contains a gene; it may occur as

an en-or in homologous recombination, a retrotransposition event, or duplication of an

entire chromosome.

Phylogenetic tree: This is also called an evolutionary tree, and shows the evolutionary

interrelationships arnong various species or other entities that are believed to have a

common ancestor.

Synteny: This describes a common order of genes, especially between related species.

xv

Page 11: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Acknowledgements

Questions and Answers are what life at the university seems to be about. Whiietiying to answer the questions about zinc fingers during my thesis, I also seem to haveleamt a lot about myseif These three years at UdeM have been a wonderffil leamingexperience both academically and personally.

First and foremost, I would like to thank rny thesis supervisor, Muriel Aubiy for acceptingand offering me the chance to be her student and work on zinc fingers and for ah theguidance and encouragement. For teaching me that what you leam during the wholeprocess of research is as important as the end resuit. For supporting me when I wasstruggiing with rny courses in French. For ail the days and nights of constant guidance shegave me for the thesis and for ahi the weekends at North Hatley. for giving me theopportunity to go to the SMBE 07, which by far has been the most exciting experience ofmy life. Dr.Muriel, thank you very rnuch foi- eveiything. This experience has made me theconfident person I am today.

Gertraud Burger, my co-supervisor for lier vahuabie guidance and suggestions. Foi- givingme the opportunity to interact with everyone from the Bioinformatics group and supportingme to let me continue in the Masters program.

Franz.B.Lang, Nicolas Lartihlot, Herve Philippe, Henner Brinkrnann and Amy Hauth for ailthe helpful guidance, discussions and constructive comments. Ahian Sun for the assistancewith the hardware and software problems I had.

My labrnates, Patricia, Deiphine, Xavier, Imene, Phuong and Hadrian for ail the help, forbeing so nice and ahways making me feel welcome in the iab.

I would like to thank my friends Uma, Reena and Ekta for supporting me during thedifficuht times I had. Karthik for being my computer guru. My girls Lakshmi, Gayatri,Sujata, Shivani and Ramaa for taking care of me and putting up with me during the difficuittimes of my thesis. Siva for helping me out at the university every tirne I had a probiem.Nagu who always let me take my frustrations and bad rnoods on him and for aiways beingthere to talk. Preethi and Kavitha for just being rny friends.

Last but not the least; this entire experience would be at most an unftilfihled dream were itnot for my loving family. I would like to express my gratitude to rny parents, Dr.NagenderSwamy and Vijaya Lakshmi for supporting my dreams and aspirations, for ietting me takemy own decisions. make mistakes, leam and grow. My sister Vamsee Priya, my brotherCharan for always being there for me and aiways taking care of me no matter what and rnybrother-in-law Sanjay.

xvi

Page 12: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Table of contents

Identification of the Jury ii

Résumé iii

Abstracty

Table of contents vii

List of figures ix

List of Supplementary f igures xi

List of Tables xii

List of Supplementaiy Tables xiii

List ofabbreviations xiv

Acknowledgernents xvi

Chapter 1 iNTRODUCTION

1.1 Transcription factors 2

1.2 The C2H2 zinc finger gene farnily 6

1.2.2 The tandemly organized C2H2 zinc finger motif 7

1.2.3 The N-terminal regulatory dornain of C2H2 zinc finger proteins 9

1.3.Gene farnilies and Gene duplication 15

1.3.1 GeneFamilies 15

1.3.2 Gene Duplication and Gene Loss: Two important evolutionary mechanisms

guiding the evolution of gene families in mammals 1$

1 .4 Infening gene duplication and gene Ioss 25

1 .5 Previous Studies addressing zinc finger gene evoltition 2$vii

Page 13: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

1.6 Hypothesis and Objective.30

Chapter 2. ARTICLE 32

Evolution ofC2H2-zinc finger genes in mammals: Species-specific duplication and loss at

the level ofclusters, genes and their frmnctional dornains 33

Chapter 3. DISCUSSION 145

3.1 The C2H2-ZNf genes in the human genome 147

3.2 Variation in the numbers ofC2H2-ZNF genes in mammalian clusters 148

3.3 Evolution of C2H2-ZNF genes in mammals through differential expansion and loss

150

3.4 Evolution ofthe C2H2-ZNf genes through duplication or loss of zinc finger and N

terminal effector motifs 152

3.5 Birth and Death model ofevolution 153

3.7 A few concems to the study 156

3.8 Merits ofthe study 159

3.9 Perspectives 160

REFERENCES 161

viii

Page 14: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

List of Figures

INTRODUCTION and DISCUSSION

Figure 1: The basic structural unit ofa C2H2 zinc finger protein 8

Figure 2: The Regulatoiy domains associated with C2H2 Zinc finger proteins 14

Figure 3: Darwin’s evolutionary tree 20

figure 4: Schernatic representation ofspeciation and duplication 22

Figure 5: Schematic representation of different evoÏutionaiy processes shaping the gene

farnilies in different species 24

Figure 6: Inferring gene duplication and loss events from a gene tree in comparison with

the species tree 27

Figure 7: Birth-and-death model ofevolution 154

Figure 8: Plot of the amino acid sequence lengths of ail the C2H2-ZNF in the human

genome 158

ARTICLE

Figure 1: Flowchart of the analysis procedure of C2H2-ZNF genes and clusters 69

Figure 2: Distribution of ail the singletons and clustered genes from the various human

C2H2-ZNF sub-farnilies and gene composition ofthe C2H2-ZNf clusters 70

Figure 3: Differential expansion and loss of C2H2-ZNF clusters in five mammalian

genomes 72

Figure 4: Evolutionary scenarios in the phylogenetic tree 74

ix

Page 15: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Figure 5: Phylogenetic analysis ofC2H2-ZNf genes in cluster 19.12 ofhuman and its

syntenic counterparts in other mammals 76

Figure 6: Physical maps showing the organization of the hurnan C2H2-ZNF from cluster

19.12 localized on 19q13.4 and its syntenically homologous counterparts in other mammals

7$

Figure 7: Variation in the numbers of zinc finger motifs in mammals and in the presence of

consewed N-terminal dornains in orthologs 80

Figure 8: Model for the evolution ofthe SCAN, SCAN-KRAB and KRAB C2H2-ZNF

subfarnilies 83

X

Page 16: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

List of Supplementary Figures

ARTICLE

Supplernentaiy Figure 1: Distribution of intergenic distances between 71$ C2H2-ZNF in

the human genome 87

Supplernentary Figure 2: Comparison of the number ofC2H2-ZNF genes in the 40 human

clusters containing at least 3 C2H2-ZNF and their syntenic counterparts in four other

mammals 8$

xi

Page 17: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

List of Tables

INTRODUCTION

Table 1: Different types ofDNA binding domains 4

xii

Page 18: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

List of Supplementary Tables

ARTICLE

Supplementaiy Table Si: Comprehensive catalogue of the 718 C2H2-ZNf genes in the

human genome 91

Supplernentaiy Table $2: Comprehensive surnmary ofthe organization of ail C2H2-ZNF

found as singletons or in clusters on each human chromosomes and classified with respect

to the various C2H2-ZNF sub-farnilies 112

Supplementary Table $3: Gene organization of the 81 hurnan C2H2-ZNF clusters 113

Supplementary Tabie S4: Comprehensive catalogue of the C2H2-ZNF genes from the 81

human clusters and their syntenic counterparts from other mammalian genomes

(chimpanzee, mouse, rat and dog) 1 15

xiii

Page 19: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Chapter L INTRODUCTION

Page 20: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

1.1 Transcription Factors

A veiy important problem in biology is trying to understand the mechanisms by

which particular genes are expressed in a temporal or a tissue-specific manner. The process

through which a DNA sequence is copied by an RNA polymerase enzymatically to produce

compÏementary RNA is called Transcription.

The transcription process in prokaryotes and eukaryotes differs in the fact that an

RNA polyrnerase alone can initiate transcription in prokaryotes. In contrast, eukaryotes

have a much more complex transcriptional regulatory mechanism. In addition to the RNA

polymerase, eukaryotic genes need an initial assernbly of transcription factors at the

promoter (Pabo and Sauer 1992).

Transcription factors are proteins involved in the regulation of gene expression by

binding to the promoter elernents upstream of genes. They are composed mainly of two

functional regions 1) a DNA-binding dornain and 2) an Effector domain.

The DNA-binding dornain consists of amino acids that recognize specific DNA

bases generally near the start of transcription. Based on its structure, the DNA-binding

domain is classified into different types as detailed in Table I.

1. Zinc finger

2. Helix-tum-helix

3. Leucine zipper domain

4. Winged helix

5. ETS domain

2

Page 21: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

6. Helix-loop-helix

7. Immunoglobulin fold

In addition to a DNA-binding dornain, transcription factors also contain an effector domain.

This domain often interacts with proteins to either inhibit or activate transcription.

Transcription factors can thus act as transcriptional activators or repressors that control

gene expression by acting directly on the RNA-polymerase-containing complex bound at

proxirnity of the transcription initiation sites and/or on proteins involved in the assembly of

chromatin, the complex of DNA and proteins that make up chromosomes (Roberts 2000).

Transcription factors bring about these changes either by themselves or indirectly by

recruiting co-factors that are called co-repressors or co-activators (Roberts 2000) depending

on their effect on transcription. Co-repressors or co-activators do not bind DNA directly,

but are recruited to the gene by the effector domain of transcription factors.

9

Page 22: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

ETS domain: This dornain is 85-90 arnino acidslong. It was discovered in the ETS oncogene.Three aipha-helices and a 4-strand beta sheetfold into a domain. The third helix is therecognition helix.Example:The Elki-E74DNA complex, where Elk-1 is amember of a large group of eukaryotictranscription factors with ETS domain.Alpha helices are in blue and beta-strands inyellowHellx-Loop-Hellx: This motif has two alphahelices connected by a loop. Generallytranscription •factors with this ioop are dimeric.A smailer helix allows dimerization while theother larger helix facilitates DNA binding.Example:Iwo alpha helices (in Red) connected by a loop(in Green) to form a domain.

Immunoglobulin fold: This is also called an ailf3 protein fold, which has a 2-layer sandwich of7 antiparallel f3-strands arranged in two f3-sheets.

Example:Hurnan Tenascin with its immunoglobulin fold,fibronectin type Iii, coloured from Blue (Nterminus) to red (C-terminus).

Winged helix: This motif has 110 aminoacids. Each dornain bas four alpha-helices andtwo beta-sheet strands.

Example: Alpha helices are in purple and betastrands are in yellow.

4

5

Page 23: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Table 1: Different types ofDNA bïnding domaïns

DNA-bindinZinc lingerA zinc finger has two antiparallel 13 strands andan a helix. Two cysteines and two histidinesinteract with a zinc ion to form a finger likestructure.

Example: The two cysteines on the beta-strandin green can be seen interacting with the twohistidines in orange on the aipha-helix. Theinteracting zinc ion is shown in red in the center.

llellx-TurnHeiïx: This is a major structuralunit capable of binding DNA.It has two aipha-helices which are joined by ashort stretch ofamino acids (turn).

Example: Helix-turn-helix (green and yellow)of bacteriophage lambda, which binds to DNA(blue and cyan).

Leucine zipper domain: it consists of a shortalpha helix with a leucine residue at everyseventh position.Example:The Ap-1 dimer formed by Fos and Junhomologous proteins. The leucine zipper motifbas two Œ-helices which look like a zipper withthe leucine residues (in Green) lining the zipper.

4

Page 24: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

1.2 The C2H2 zinc finger gene famïly

0f the rnany large families encoding transcription factors that have been identifled,

zinc finger genes of the C2H2 type constitute the largest one (Schuh, Aicher et al. 1986;

Bellefroid, Lecocq et al. 1989). The C2H2 motif encoded in these genes typically includes

two cysteines and two histidines coordinating a zinc ion. This motif was first identified in

the TFIIIA of Xenopus leavis and later in the Krtippel drosophila segmentation gene

(Miller, McLachlan et al. 1985; Schuh, Aicher et al. 1986). Thus, the C2H2 zinc finger

genes are often refeired to as TFIIIA/Kmppel type of zinc finger genes.

Known to constitute one of the ten largest gene families LPfam databasel, these

zinc finger genes are found not oniy in eukaiyotes but also in prokaiyotes. Members of

C2H2 zinc finger family have now been identified in ail kingdoms of life i.e. eubacteria,

archaebacteria, protists, ftingi, animais and plants (Bouhouche, Syvanen et al. 2000;

Moreira and Rodriguez-Valera 2000). Throughout evolution, there bas been a massive

expansion in the numbers of the C2H2 zinc finger genes (Lander, Linton et al. 2001;

Venter, Adams et al. 2001). Noticeably, human beings are predicted to have more than 700

zinc finger genes often found in a ciustered organization (Bellefroid, Lecocq et al. 1989;

Looman, Abrink et ai. 2002).

While most of the C2H2 zinc finger genes characterized have been described as genes

encoding transcription factors which bind to DNA, some are also known to encode RNA

binding proteins that may thus participate in RNA metabolisrn or maturation (Theunissen,

Rudt et ai. 1992; Grondin, Bazinet et al. 1996).

6

Page 25: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

1.2.1 Structure of the proteins encoded by C2H2 ZNf genes

The C2H2 zinc finger transcription factors generaÏly consist of two essential regions

1) the C2H2 zinc finger region containing in most instances several zinc finger motifs

organized in tandem and 2) the N-terminal regulatoiy domain

1.2.2 The tandemly organized C2H2 zinc finger motif

The C2H2 zinc finger proteins are composed of zinc finger motifs which form the

zinc finger region of the protein. Each motif is a highly conserved sequence of 28 amino

acids (CX24CX3FX5LX2HX34HTGEKPYX, where X is any amino acid). Each motif is

separated from the following one by a highly conserved linker region (TGEKPYX, where

X is any arnino acid) (Miller, McLachlan et al. 1985; Wolfe, Nekiudova et al. 2000;

Loornan, Abrink et al. 2002). The basic conserved C2H2 zinc finger structural unit includes

two cysteines and two histidines which interact with a zinc ion and are essential for the

proper folding ofthe motif into a finger like structure (See f igure 1) (Looman, Abrink et al.

2002). C2H2 zinc finger proteins are composed of one or more tandemly organized zinc

finger motifs. The number of zinc finger motifs in the protein varies from one to more than

30 in a few cases (Ruiz i Altaba, Peny-O’Keefe et al. 1987).

7

Page 26: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

()

o Jo o

Ç) o‘© (v

k)

A c

________

B

o (F)

Jou n

orn- -ÇÇ) cm—

tHC) tH)’

®©( ®®

Figure 1: The basic structural unit of a C2112 zinc linger protein.

(A) The C2H2 zinc finger motif is present in tandem in the protein. Three zinc linger motifs arc

connected by a conserved Iinker region (TGEKPY). The two cysteines and two histidines which

interact with a zinc ion inc]uding die other conserved residues are shown with their single letter

codes. The residues involved in DNA binding are shown in grey.

(B) The three-dimensional structure of a zinc finger binding domain. Two anti-parallel f3-strands

and one Œ-heÏix interact with a zinc ion as shown in the figure.

8

Page 27: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Nuclear magnetic resonance spectroscopy (NMR) was used to determine the three

dimensional structure of the C2H2 zinc finger motif (Lee, Gippert et al. 1989; Omichinski,

Clore et al. 1990). Two beta strands and one alpha helix form an independently folded

domain with a compact globular structure (See figure 1). The zinc ion, that is tetrahedrally

coordinated betwcen two invariant pairs of cysteines and histidines, connects the 3-sheet

and the a-helix. Four amino acids on the surface of the a-helix in the zinc finger motif

make base specific contacts with three to four bases in the major groove of the DNA helix

(frankel, Berg et al. 1987; Panaga, Horvath et al. 1988; Omichinski, Clore et al. 1992;

Krishna, Majumdar et al. 2003). Aithougli the zinc finger domain has been described as

nucleic acid binding domain, not ail the zinc finger motifs are involved in DNA or RNA

binding. For example, in ZBRK1 zinc finger protein, only the first few fingers are involved

DNA binding and ail the others in protein-protein interactions (Zheng, Pan et al. 2000).

1.2.3 The N-terminal regulatory domain of C2112 zinc fingerproteins

In addition to the zinc finger region, C2H2 zinc finger proteins are also associated

with an N-terminal regulatoiy domain (f igure 2), which regulates subcellular localization

and the gene expression by acting as either a repressor or an activator by itself or by

associating with other factors (Collins, Stone et al. 2001).

9

Page 28: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

The regulatoiy domains associated with C2H2 Zinc finger proteins are

j. BTB/POZ domain

ii. KRAB domain

iii. SCAN dornain

j. The BTB domain

The BTB domain (Broad-Cornplex, Tramtrack and Bric-a-bric) also known as

the POZ domain (Pox virus and Zinc finger) is a 120 arnino acid conserved dornain found

to be associated with both DNA and actin-binding proteins. The 3TB domain is involved in

protein-protein interactions (Collins, Stone et al. 2001). As a part of DNA binding proteins,

the BTB/POZ domain is known to be a dirnerization domain which, in a few cases also

recmits co-repressors (such as N-CoR, STN3A or SMRT) and acts a repression domain.

When found in association with C2H2 zinc finger transcription factors, the BTB domain is

generally located N-terminal to the zinc finger region. Thus, by mediating oligornerization

and in some instances interaction with co-factors, the BTB domain can lead to chromatin

remodeling and change in gene expression (Melnick, Carlile et al. 2002).

ii. The Kruppel-Associated Box (KRAB) domaïn

Another well known example of an N-terminal regulatory domain associated

with C2H2 zinc finger proteins is the Kruppel-Associated Box or the KRAB domain

(Bellefroid, Poncelet et al. 1991; Rosati, Marino et al. 1991). KRAB domains are almost

10

Page 29: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

aiways associated with C2H2 zinc finger proteins. An exception to this scenario is the

SSX family ofproteins. These proteins are associated with a “SSX KRAB dornain” which

is distantly related to the KRAB domain from C2H2 zinc finger proteins ( 49% similar)

but are flot associated with zinc fingers (Collins, Stone et al. 2001; Urrutia 2003).

Unlike the C2H2 zinc finger proteins which are present in organisms ranging

from bacteria to humans, the KRAB dornain as seen in C2H2 zinc finger proteins is present

only in vertebrates, more specifically in tetrapods (Looman, Abrink et al. 2002). However,

a recent study identified a sea urchin homolog to the mammalian Meistez protein which

includes a tandem array of C2H2 zinc finger motifs, a SET dornain and a sequence with

some sirnilarity to the “SSX-KRAB domain” (Birtie and Ponting 2006). This suggests the

presence of the KRAB domain in the common ancestor of echinoderms and vertebrates. A

further study of these proteins in ftingi and plants identified a 26 amino acid motif called

the KRI motif which was found to be similar to the aipha-helical regions of KRAB and

present in ail eukaryotes. This indicated that the KRI motif was present in the last common

ancestor of animais, plants and fungi and is the progenitor of the KRAB dornain.

The KRAB domain is most abundant in mammals (Lander, Linton et al. 2001;

Venter, Adams et al. 2001; Waterston, Lindblad-Toh et al. 2002). For example, about one

third of the mouse C2H2-ZNF are associated with KRAB (Benn, Antoine et al. 1991;

Waterston, Lindblad-Toh et al. 2002). The KRAB domain is mostly associated with more

than 5 C2H2 zinc finger motifs in a protein, justifying the name “Multifingered protein”

(Bellefroid, Poncelet et al. 1991). Many genes encoding the KRAB containing proteins are

found in a clustered organization as opposed to the ones found as singletons (Bellefroid,

11

Page 30: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Marine et al. 1993; Shannon, Kim et al. 199$; Chung, Schafer et al. 2002; Rousseau

Merck, Koczan et al. 2002; Hamilton, Huntley et al. 2003).

The KRAB domain is 75 amino acids long and is divided into two boxes, Box A

(-38 amino acids) and Box B (32 amino acids) (Looman, Abrink et al. 2002; Urrutia

2003). A variant of the B box, called b box also exists. Some C2H2 zinc finger proteins

have another box following the A box, called the C box (21 amino acids). Each of these

boxes is encoded by different exons and separated by introns of vaiying lengths (Loornan,

Heilman et al. 2004). The KRAB domain functions as a potent repressor of transcription

(Margolin, Friedman et al. 1994). The KRAB A box plays an important role in repression

by binding to co-repressors, while the KRAB B box doesn’t have transcriptional activity

but is known to enhance the repression activity of the A box (Witzgall, O’Leary et al.

1994). The process of transcription repression is mediated by KAP-1, also called

transcription intenriediary factor 13 (Tlflt3) which is a co-repressor that interacts with

KRAB A (Friedman, Fredericks et al. 1996; Germain-Desprez, Bazinet et aI. 2003). The

KRAB domain of C2H2 zinc finger proteins recruits the KAP I co-repressor to DNA,

which results in the formation of a heterochromatin like complex and leads to gene

silencing (Pengue, Calabro et al. 1994; Kim, Chen et al. 1996; Moosrnann, Georgiev et al.

1996; Pengue and Lania 1996).

iii. The SCAN domain

The SCAN domain like the KRAB domain is another vertebrate specific domain

only associated with C2H2 zinc finger proteins (Williams, Khachigian et al. 1995; Looman,

12

Page 31: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Abrink et al. 2002). The SCAN domain was estimated to be associated with 10% of the

total C2H2-ZNF present in the human genome (Collins, Stone et al. 2001; Edeistein and

Collins 2005). Also known as the LeR domain because of its leucine rich primaiy

structure, the SCAN domain is named after the four proteins it was initially identifled

(SRE-ZBP, CTfin5l, AW-1 and Numberl8 cDNA) (Urrutia 2003). In addition to being

associated with the C2H2 zinc finger proteins, the SCAN domain containing proteins are

sometimes associated with a KRAB domain having the organization SCAN-KRAB

(C2H2) or in very few cases KRAB-SCAN-KRAB-(C2H2) (Edeistein and Collins 2005;

Huntley, Baggott et al. 2006).

Structural studies on the SCAN domain indicate that it has 84 arnino acids and is

found to have three to five a-helices which are delineated by one or more proline residues.

Proline residues are also present before and after the SCAN domain (Stone, Maki et al.

2002). The SCAN domain is a homo and hetero-dimerization domain mediating protein

protein interactions by self association and formation of heterodimers between SCAN

family members (Sander, Haas et al. 2000; Schumacher, Wang et al. 2000). The importance

of the dimerization for the transcriptional activity of SCAN-C2H2 zinc finger protein lias

flot been clearly established.

13

Page 32: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Regiilatory Dornain Spacer Zinc linger region

_

- flUBA

VTFED5AVYFSQEEWGLLDPAQRNLYRDvLENY

RNLVSL

—FJT------ -

J.

KRAB b

bHQLFJOEDX I sQLEREEKLWMMIxATQRGDS S>’k !.

nU

SCANPDPEIFRQRFRQFCYQETPGPREALSR LRELCHQ

WLRPEVHTKEQILEL LVLEQF LTI LPKELQAWVQ

EIfflPESGEEAVTLLEDLERELDEPGQQV

. LQNPSIWTGLLCKANQMRLAGTLCDVVIMVDSQE

FEFTILiCTSK14FEILFRRNSQHiTLDFLSPK., ., . TFQQILEYAYTATLQAKAEDLDDLLYAAEILEIE

Y LEEQC LKM L

B

Figure 2: The Regulatory domains associated with C2H2 Zinc linger proteins.

(A) The different combinations of dornains associated with zinc finger proteins are shown.

Zinc finger proteins generally have three main regions: The Regulatory domain, the Spacer

and the Zinc finger region. (B) The consensus sequence of the domains KRAB (A, B, b and

C boxes), SCAN and BTB. The residues essential for binding KAPI and thus for repression

are shown in KRAB A underlined in red.

14

Page 33: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

1.3 Gene familles and Gene duplication

1.3.1 Gene Families

A gene family colTesponds to a set of genes that are grouped based on their shared

homology, biologicai or biochemical activity, sequence motifs or similarities in stntcture.

Because they consist of a large number of genes, gene families are the most informative

systems to study evolutionary dynamics of genes. Nuclear genomes have many multigene

families and their studies provide dues to the evolutionaiy forces that have shaped these

genomes (Ohta 2000; Thomton and DeSalle 2000). Mammalian genornes in particular have

large numbers of genes organized in gene families (Demuth, Bie et al. 2006). Some gene

families have uniforrn copy numbers of genes in ail species (Thomton and DeSalle 2000),

while there are gene families like the Immunoglobulin gene family, the Olfactory receptor

gene family and the C2H2 zinc finger gene family which have a large variation in the

number of genes across different species . The variation in the gene numbers of these

families and diversity in ftinction, suggests that gene duplication and/or gene ioss have

played an important role in shaping different mammalian genomes.

I. The Olfactory Receptor gene family

Olfactoiy receptor (OR) genes form the largest known multigene family in

mammalian genomes (Glusman, Bahar et al. 2000) and code for membrane receptors that

are responsible for olfaction, the sense ofsmell. OR genes are present in various vertebrates

ranging from lampreys to humans. The OR proteins belong to the G-protein coupÏed

15

Page 34: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

receptor family which have seven transmembrane dornains. OR genes are divided into 2

classes based on their protein sequence similarity (Glusman, Bahar et al. 2000; fuchs,

Glusman et al. 2001). 0f the two classes, Class I genes first identified in fish but also found

in mammals are specialized in water-soluble odorants and the Class II genes specialized for

airbome odorants are specific to tetrapods.

The number of the OR genes is quite varied in different genomes. Rodents have

nearly twice as many as the number present in human, chimpanzee or dog (Niimura and

Nei 2005). The Human genome has more than half of the -900 OR genes as pseudogenes.

In contrast, the mouse genome bas l300 OR genes of which only one-fourth are

pseudogenes. $tudies on the human, chimpanzee and mouse OR gene repertoires indicate

that there are species-specific expansion and pseudogenization signifying different

selection pressures in humans, chimpanzees and mouse owing to their different sensoiy

requirements (Sharon, Glusman et al. 1999; Glusman, Yanai et al. 2001; Lapidot, Pilpel et

al. 2001; Gilad, Man et al. 2005; Niimura and Nei 2005). Evolutionary analysis of the

human, mouse and chimpanzee datasets indicate the presence of clustered organization

which is generally well conserved in these genornes. Analyses of the clusters indicate that

there are tandem arrays of the OR genes which appear to have arisen by tandem duplication

and several chromosornal rearrangements. The difference in the numbers of OR genes in

hurnan and mouse has been attributed to gene duplication and loss events (Sharon,

Glusman et al. 1999; Niimura and Nei 2005).

16

Page 35: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

ii. The Immunoglobtilin gene family

The immunoglobulin gene family represents an example where its two subfamilies,

the immunoglobulin heavy variable region sub-farnily and immunoglobulin light chain

variable region subfamily, have co-evolved by valying in gene number and extent of

diversity in different species ($itnikova and Sti 1998). An immunoglobulin molecule is a

tetramer with two identical heavy chains and two identical light chains which forrn a Y

shaped structure. Each of these chains has a variable (V) and constant (C) domain. The VH

and VL domains have the complementarity determining regions, called the CDRs which

form the sites of interaction with antigens. Analyses of these two sub-farnilies of genes

from various species of amniotes identified that these gene families have diversified

throughout the course of evolution (Sitnikova and Su 1998). Different coordinated loss and

duplication events have led to different species-specific gene repertoires.

iii. The C2112 zinc finger gene family

In addition to the above mentioned gene families, the C2H2 Zinc finger gene farnily

is another example of a large multigene family with varying number of genes in different

species. Over the course of evolution, this gene famlly bas expanded drastically in

mammalian genornes (e.g. — 400 in mouse and 700 in human) (Venter, Adams et al.

2001; Waterston, Lindblad-Toh et al. 2002). Several studies involving these genes in the

human genome have indicated that tandem duplication events are responsible for the

17

Page 36: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

clustered organization of this family (Shannon, Kim et al. 199$; Elemento and Gascuel

2002; Elemento, Gascuel et al. 2002; Tang, Waterman et ai. 2002; Bertrand and Gascuel

2005; Huntley, Baggott et al. 2006). A few instances of evoiutionary studies ofthese genes,

within the human genome and among a few mamnialian genomes document cases of

species-specific duplication (Dehal, Predki et ai. 2001; Shannon, Hamilton et ai. 2003;

Huntley, Baggott et al. 2006).

Ail these examples of gene families suggest variation in number among

different species involving different duplication and ioss events. The gene family size could

vaiy based on the ftinctionai relevance of the gene farniiy in the organism. These examples

also indicate the importance of studying the gene families to give dues on the evolutionaiy

mechanisms which led to different sizes of gene families.

1.3.2 Gene Duplication ami Gene Loss: Two important evolutionary

mechanisms guiding the evolution of gene famïlîes in mammals

Considering the extremely large numbers of genes constituting gene families

(Demuth, Bie et al. 2006), it is interesting to study their organization and the evolutionaiy

mechanisms that created them. A study integrating the information from spatial

organization of the genes with the phylogenetic reiationships between the genes combined

with evolutionaiy information of the species would help provide dues about the evolution

ofthe gene families.

18

Page 37: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

In the context of using phylogenetic studies to analyze the evolutionary

reiationships between genes in gene families, one significant term that features in ail studies

is “Romoiogy”. Homology forms the centrai and basic concept of comparative genomics

but is aiso a terni that is often misrepresented and misinterpreted. The terni homology was

introduced by Richard Owen in 1848, where lie defined homology as “the sanie organ

under eveiy variety of form and fttnction”(Francis Darwin 1903) . The importance of

structure and fiinction is ernphasized more in this definition. In an attempt to give an

evoiutionary explanation to hornoiogous structures, Darwin defined homology as “A

structure is sirniiar among reiated organisms because those organisms have ail descended

from a common ancestor that had an equivaient trait” (Darwin 1837) (Figure 3).

When put in the context of molecular sequence comparison, in today’s times,

homoiogy refers in an abstract way to a reiationship which implies a possible common

ancestry and shouid be differentiated from identity2 or similarity3 of sequences. However,

to be substantiated, homology must be confirmed by appropriate phylogenetic studies. It is

important to note that homology does flot say anything about functionai simiiarity

(Thomton and DeSalle 2000))(Fitch 2000).

131Homology: A hypothesis that signifies comnion ancestry between sequences (nucleotide or amino acid)which is prirnarily based on sequence similarity.

2ldentity: The extent to which two (nucleotide or amino acid) sequences are invariant.

Similarity: The extent to which nucleotide or protein sequences are related. The extent ofsirnilaritybetween two sequences can be based on percent sequence identity and!or conservation

19

Page 38: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Figure 3: Darwin’s evolutionary tree.

The figure is Charles Darwin’s first ever sketch of an evolutionary tree from bis book titled

“First Notebook on Transmutation of Species (1837)”.

20

Page 39: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

There are three major types of homology in a phylogenetic context which are

orthology, paralogy and xenology. Orthotogy as described by f itch in 1970 is the

relationship between two genes in two different species which originated from a common

ancestor. Two homologous sequences are considered to be “orthologous” if a speciation

event separates them. In contrast, Paralogy signifies the relationship between two genes

which have been formed by a gene duplication event. XenoÏogy, another type of homology

relationship describes the relationship between two genes which have been transferred

between two species by horizontal gene transfer.

Studying the homologous relationships of genes within and between various

genomes and differentiating between orthologs and paralogs is a central aspect of

comparative genomics. figure 4 shows a very simple explanation of the difference between

orthologs and paralogs. The genes Ai, Bi and 32 have evolved from an ancestral gene by

speciation followed by a duplication event in species B. Gene Al from species A is an

ortholog of gene Bi and gene B2 in species B illustrating that one gene in a particular

species may have more than one ortholog in the other. Gene Bi and gene B2 in species B,

which were formed by gene duplication, are paralogs to each other.

21

Page 40: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Speciahon

DRptkWiJrn

Figure 4: Schematic representation of speciation and duplication

Genes Ai, Bi and B2 are formed from an ancestral gene by a speciation and duplication

event. Gene Al from species A has two orthologs in species B, genes 31 and B2. Bi and

32 are paralogs.

22

Page 41: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

figure 5 depicts different evolutionary scenarios that one encounters while

studying the evolution of genes in gene families are depicted to explain the relationships

between genes in ternis of orthologs, paralogs and gene Ioss. An ancestral gene undergoes

duplication in species O to give the genes, A, B and C. This is followed by a speciation

event with genes Ai, Bi and Clin Species i and genes A2 and B2 in Species 2. The gene

Al is an ortholog of A2 and Bi is an ortholog of 32. The gene Cl does not have a

corresponding ortholog, as the Species 2 lost the gene after speciation. The genes AI, Bi

and Cl are paralogs within species 1 and, A2 and B2 are paraiogs within species 2.

Furthermore, as explicitly pointed out recently by fitch (Fitch 2000) and as often ignored,

gene Ai (species i) is also a paralog of gene B2 (species 2), gene Bi (species 1) is a

paraiog of gene A2 (species 2) and gene Cl is paralog of gene A2 and B2 (species 2). f rom

these explanations, it is clear that orthologs are homologous genes residing in different

species, while paralogs may not only refer to the homoÏogy relationship between genes

from the same species but also from different species. It is essential to understand that both

orthologs and paralogs are free to diverge and do not necessarily aiways have the same

function (Thornton and DeSalle 2000).

23

Page 42: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Duplk&dsn ewm

Geiz, Ie

Figure 5: Schematic representation of different evolutionary processes shaping the

gene familles in different species.

Gene duplication, speciation and loss lead to the formation of genes Ai, Bi, Cl in species

1 and A2 and B2 in species 2.

csO

1Spci1in ewiu

24

Page 43: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

1.4 Inferring gene duplication and gene loss

That two genes are homologous is a hypothesis that needs to be studied and

analyzed to be able to derive the relationships between the genes to be either orthologs or

paralogs. Studying and analyzing the relationships between gene farnilies i.e. evaluation of

orthology or paralogy requires a well formulated approach. In order to be able to postulate

theories on how related genes evolved from an ancestral gene i.e. by gene duplication, gene

loss or by speciation, one needs to assess homology relationships using a well founded

phylogeny.

The first step in assessing homology is a sequence alignment of the molecular

sequences be it nucleotides or arnino acids. This gives a preliminary measure of possible

homology which can then be assessed using a phylogeny. A welI supported phylogeny

gives the evolutionaiy relationships bePveen the genes in relation to one another.

Comparison of a gene phylogeny between genes with the taxonomic relationships between

species, allows gene duplication and loss events to be assessed and roughly dated. As an

example, figure 6 shows the different scenarios of gene duplication and gene Ioss. In

figure 6A, it can be seen that a duplication event prior to the speciation event resulted in

species 2, 3 and 4 created the paralogous gene groups A, B and eventually C. In a

hypothetical situation, suppose the genes 2A, 3C and 4B are missing from the gene tree.

Assuming that the studied sequences were derived from completely sequenced genomes,

the missing genes could either be due to a loss of genes or their possible pseudogenization

in the respective species. That the gene duplication occurred prior to speciation can stili be

25

Page 44: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

resolved by superimposing the gene tree with the species tree. Reconciliation between the

species and the gene tree will help resolve the absence of genes 2A, 3C and 4B as can be

seen in figure 6B. This kind of srndy can hence be used to infer the evolution of genes

belonging to gene families within and among species in a phylogenetic context.

26

Page 45: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

lA2A3A

4A3C4C4D2B3134E lA

3A

4A3C4C4D2E3B413

Figure 6: Inferring gene duplication and loss events from a gene tree in comparison

with the species tree.

(A) A gene tree showing the phylogeny between genes belonging to species 1, 2, 3 and 4. Genes are

represented as A, B, C and D. A gene duplication represented as x occurred prior to the speciation

event leading to Species 2, 3 and 4.

(B) A species tree showing the relation between species 1, 2, 3 and 4.

(C) A hypothetical situation where the genes 2A, 3C and 4B are missing from the tree as shown in

red. Reconciliation of the phylogenetic tree from (A) with the species tree from (B) helps identify

the flot only the duplication event but also the missing genes to be able to infer loss. Adapted from

(Thornton and De$alÏe 2000)

j

4B

27

Page 46: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

1.5 Previous Studïes addressing zinc finger gene evolution

About 2000 of the 30,000 genes in the human genorne code for transcription

factors (Venter, Adams et al. 2001). C2H2-ZNF are the most common of ail the eukaiyotic

transcription factors present in the human genome (encoded by 700 genes). Owing to

these facts, the C2H2 zinc finger gene famiiy has been considered to be an evoiutionary

piayground for genes to develop and differentiate and hence is an interesting famiiy to

study (Looman, Abrink et al. 2002).

The studies pertaining to C2H2-ZNF have mostly been restricted to those associated

with a KRAB domain and more specifically to the human genome. A recent study

identified 423 KRAB C2H2-ZNF ioci organized into 65 ciusters on the human genome

(Huntley, Baggott et al. 2006). Evolutionary studies involving these KRAB C2H2-ZNF

genes indicated that the evolutionaiy reiatedness within and among ciusters was flot only

associated with physical proximity evoiving through tandem duplications but aiso through

distributed duplication and postduplication rearrangement events, which have lcd to the

drarnatic increase in the gene numbers of this famiiy in hurnans (Hamiiton, Huntley et ai.

2003; Hamiiton, Huntley et al. 2006; Huntley, Baggott et ai. 2006). Though present in

clusters, the KRAB C2H2-ZNF are not co-regulated and they show different pattems of

expression (Huntley, Baggott et al. 2006).

2$

Page 47: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

A study of one KRAB C2H2-ZNF gene cluster on hurnan chromosome 19,

suggested an evolutionary model showing the presence of certain beta-satellite repeat

structures symrnetrically ordered with the zinc finger genes in the cluster which have

coevolved with the cluster accommodating the expansion of the genes within this cluster

(Eichler, Hoffman et al. 1998).

A statistical analysis using phylogenetic models on four hurnan C2H2-ZNF clusters

on chromosome 19 indicated that positive selection is the driving force involved in the

diversification of the KRAB C2H2 zinc linger genes (Schmidt and Durrett 2004).

Not much is known about the evolutionary histories of these genes in different

mammalian genomes and very few studies have been carried out to comparatively analyze

their evolution. A preliminaiy report on species-specific expansion of these genes, resulted

from one study on a C2H2-ZNF cluster on human chromosome 19 and its syntenically

homologous cluster on mouse chromosome 7 (Shannon, Hamilton et al. 2003) . A study on

the evolution of members of the primate-specific ZNF9 1 KRAB subfamily, which are

mainly found in a chromosome 19 cluster, revealed that this gene subfamily evolved before

the spiit of humans and apes. But afier the split, these genes have continued to evolve

differentially be it through tandem duplications or segmental duplications, leading to

species-specific genes (Dehal, Predki et al. 2001; Hamilton, Huntley et al. 2006).

Inspite of several studies dealing with these genes, there has neyer been a

comprehensive study on the C2H2-ZNF genes and their evolution or their functions. In

29

Page 48: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

order to systematically define and analyze the extent of species-specffic duplication and

the role of gene loss in the evolution of these genes, it is important to conduct a

comprehensive study of these gene clusters in mammalian genomes to obtain dues on their

evolution and their possible implications on ffinctions specific to each species.

1.6 Hypothesis and Objective

Previous studies on zinc finger genes have provided evidence that zinc finger

genes have undergone a huge expansion in vertebrate genomes, with a specific increase in

humans. Studies have shown that these genes have been subjected to expansion through

tandem duplication and also of the existence of species-specific duplication events

(Shannon, Kim et al. 199$; Shannon, Hamilton et al. 2003; Harnilton, Huntley et al. 2006;

Huntley, Baggott et al. 2006). A contribution of gene loss in the evolution of C2H2 zinc

finger genes has been suggested but neyer tested rigorously.

The main objective of this thesis is to systematically determine to what extent zinc

finger genes are subrnitted to species-specific expansion and to assess the potential

contribution of gene loss in the evolution of this gene famiiy in mammals. To this end, we

have:

1. Assembled a curated database of ail C2H2-ZNF genes in the hurnan genome and

identify ail the C2H2-ZNF clusters in the human genome

30

Page 49: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

2. Searched for syntenically homologous clusters in other cornpletely sequenced

mammalian genomes, narnely chimpanzee, mouse, rat and dog genomes.

3. Perforrned a phylogenetic analysis of C2H2-ZNF genes from the syntenically

homologous clusters.

4. Perfornied a reconciliation of both phylogenetic analyses and physical maps of the

clusters with the species tree accounting for the evolutionary history of the species

in order to infer gene loss and gain.

These studies should allow us to determine the nature of evolutionary events that

shaped this large gene farnily in mammals. In particular, this study wiIl help us to better

infer orthology in the various mammals and better understand the evolution and

relationships between the different C2H2-ZNF subfarnilies.

31

Page 50: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Chapter 2. ARTICLE

32

Page 51: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Evolution of C2H2-zinc finger genes in mammals:

Species-specific duplication and loss at the level of

clusters, genes and their functional domains.

Hamsa Dhwani Tadepalfy, Gertraud Burger and Muriel Aubry*

Department of Biochemistry, Université de Montreal, C.P.612$,

Succ.Centre-Ville, Montreal, QC, H3C 3J7, Canada

To whom correspondence and reprints should be addressed:

Muriel Aubry, Ph.D.

Departrnent of Biochemistiy

Université de Montréal

C.P. 6128, Succ. Centre-Ville

Montréal, H3C 3J7

Canada

Key words: C2H2/Kruppel, zinc finger, gene family, tandem repeats, gene duplication,

gene loss, evolution.

33

Page 52: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

ABSTRACT

C2H2 zinc finger genes (C2H2-ZNF) constitute the iargest class of transcription factors in

hurnans and one of the largest gene families in mammals. Often arranged in clusters in the

genome, these genes are thought to have undergone a massive expansion in vertebrates by a

process involving tandem duplication. However, this view is based on lirnited datasets

restricted to single chromosome or a specific subfamiiy of C2H2-ZNF genes. Here, we

present the first comprehensive study of the dynamic evoiution of the C2H2-ZNF family in

mammals. We assernbled the complete repertoire of hurnan C2H2-ZNF genes (718 in

total), about 70 % of which are organized into 81 ciusters across ail chromosomes. Based

on an analysis of their N-terminal effector domains, we identified

SET- and HOMEO dornain-encoding C2H2-ZNF genes as members of two new C2H2-

ZNf subfamiiies. We searched for the syntenic counterparts of human clusters in other

mammals for which compiete gene data are avaiiable: chirnpanzee, mouse, rat and dog.

Cross-species comparisons show a large variation in the numbers of C2H2-ZNF genes

within homologous mammalian clusters stiggesting differential pattems of evolution.

Phylogenetic anaiysis of selected C2H2-ZNF clusters reveals that differences in C2H2-ZNF

gene repertoires across mammais not only originate from differentiai gene duplication but

also gene loss. Further, we find variations among orthologs in the number of zinc finger

motifs and association of the effector dornains, the later often undergoing sequence

degeneration. Based on these resuits and an anaiysis of the exon-intron organization of

genes from the large SCAN and KRAB domains-containing subfamilies, we propose a new

model for the evolution ofthese subfamilies.

34

Page 53: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

This manuscript includes two supplementaiy Figures and four supplementary Tables

INTRODUCTION

The human genome sequence uncovered a large number of gene families oflen

arranged in a clustered organization (Ohta 2000; Thornton and DeSalle 2000; Venter,

Adams et ai. 2001). C2H2 zinc finger (C2H2-ZNf) genes make tip 2 % of ail the human

genes and represent the second largest gene family in humans after the odorant receptor

farnily (Lander, Linton et al. 2001) (Schuh, Aicher et al. 1986; Bellefroid, Lecocq et al.

1989; Messina, Glasscock et al. 2004). The first identified members of the C2H2-ZNF

family are Xen opus TFIIIA and Drosophila Kruppel and thus genes of this family are often

called zinc finger genes of the TFIIIA or Kruppel type (Miller, McLachlan et ai. 1985;

Schuh, Aicher et al. 1986).

Most of the characterized C2H2-ZNF genes code for transcription factors which

bind DNA through their zinc finger region; others bind RNA and their exact function is yet

unknown (Theunissen, Rudt et al. 1992; Grondin, Bazinet et al. 1996). The zinc finger

region is cornposed of a basic structural unit of 28 amino acids (CX21CX3FX5LX2HX3

4HTGEKPYX, where X is any arnino acid), called the zinc finger motif, that is often

repeated in tandem. The two cysteines and two histidines in this motif interact with a zinc

ion, stabilizing the proper folding of this motif (Klug and Rhodes 1987; Lee, Gippert et al.

1989; Rhodes and Klug 1993). C2H2-ZNF proteins often contain an effector domain

aiways located N-terminal to the zinc finger region, such as the KRAB (Kntppel

Associated-Box), SCAN (SRE-ZBP, CTfin5l, AW-1 and Numberl8 cDNA) and BTB

(Broad-Complex, Tramtrack and Bric-a-bric) domains. The first two domains are

35

Page 54: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

vertebrate-specific (BelÎefroid, Poncelet et al. 199f; Rosati, Marino et al. 1991; Collins,

Stone et al. 2001), while BTB is also present in insects. The KRAB domain includes the

box KRAB A (—38 amino acids) involved in transcriptional repression and often a second

box, usually KRAB B (—32 amino acids) or in few cases KRAB b or KRAB C (—21 amino

acids) box (Witzgall, O’Leary et al. 1994; Looman, Abrink et al. 2002; Urrutia 2003;

Looman, Heilman et al. 2004). The KRAB A box and the second KRAB B, b or C box are

encoded by separate exons, which are alternatively spliced. The SCAN, also called the

leucine-rich (LeR) domain (— 84 amino acids) (Stone, Maki et al. 2002) mediates protein

protein interactions through dimerization (Sander, Haas et al. 2000; Schumacher, Wang et

al. 2000). The BTB dornain Q—- 120 amino acids) is a dimerization domain that also acts as a

repression dornain in some cases (Melnick, Carlile et al. 2002). In contrast to the SCAN

and KRAB domains which are only present in C2H2-ZNf proteins, the BTB domain is

also found as a part of actin-binding proteins (Coïlins, Stone et al. 2001). C2H2-ZNF

proteins are grouped into different subfamilies based on the type of N-terminal effector

dornain present.

Initial studies on the C2H2-ZNF gene farnily focused on hurnan chromosome 19,

which is particularly enriched in clusters of these genes (Bellefroid, Marine et al. 1993;

Eichler, Hoffman et al. 1998). More recent studies deait more specifically with the KRAB

subfarnily (Mark, Abrink et aI. 1999; Looman, Abrink et al. 2002; Shannon, Harnilton et al.

2003; Huntley, Baggott et al. 2006). The current view is that C2H2-ZNf genes have

undergone a massive expansion during vertebrate evolution by a process involving tandem

duplication (Dehal, Predki et al. 2001; Looman, Abrink et al. 2002; Hamilton, Huntley et

36

Page 55: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

al. 2003; Shannon, Harnilton et al. 2003; Harnilton, Huntley et ai. 2006; Huntley, Baggott

et al. 2006). Yet, this view may be biased because it is extrapolated from smali subsets of

C2H2-ZNF genes.

In this report, we reconstructed a global picture of the evolution of the C2H2-ZNF

gene repertoires during mammalian speciation, based on a comprehensive catalogue of ail

human C2H2-ZNf genes and their syntenic counterparts present in clusters in other

mammals. Our study clearly dernonstrates that this gene farnily expanded and contracted

flot only in hurnan but across mammals and in a lineage-specific fashion. In addition, we

discovered evolutionary change of individual C2H2-ZNF orthologs invoiving both

differential duplication of zinc finger motifs and loss of N-terminai effector dornains.

$peciation of mammals is characterized by divergent evolutionary trends at the level of

individual C2H2-ZNF genes as well as the entire farnily. This led us to propose a model for

the evolution of SCAN, SCAN-KRAB and KRAB subfamilies and points to the importance

of comparing complete repertoires rather than C2H2-ZNF genes from specific subfarnilies

for gaining insights into the possible orthologous relationships between genes from varions

genornes.

37

Page 56: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

METHODS

Collection of human C2H2 zinc finger genes

We conducted an extensive sirnilarity search to identify the compiete repertoire of

C2H2-ZNF genes in the hurnan genome (assembly NCBI 36). First, we identified ail the

genes annotated as C2H2 and/or Kmppei zinc finger genes by performing an initiai text

term search via Entrez (www.ncbi.nlm.nih.gov). Second, we used PROSITE

(http://www.expasy.com) to identify ah the proteins which had a zinc finger motif of the

C2H2 type as weil as the N-terminal effector domain, if present.

from these searches, the genomic coordinates, chromosome number, position on the

chromosome, number of fingers and identified domains were collected for each of the gene

and protein sequences (initial dataset). A TBLASN (e-vaiue ctttoff le-3) (Gertz, Yu et ai.

2006) search was done against the genome using each of the gene sequences from the

initial dataset as a query. The blast hits were used to generate the final dataset of ah the

identified C2H2-ZNF genes (Suppiementary Table Si).

Identification of C2H2-ZNF gene clusters in the human genome

We anaiyzed the relative positions of C2H2-ZNF genes in the human genorne in

order to identify the C2H2-ZNF clusters. A distribution of the distances between

neighboring C2H2-ZNF genes in the human genome is presented in Supplernentaiy Figure

Si. Two consecutive C2H2-ZNF genes are said to beiong to a ciuster if the distance

between them is 500 kb regardiess of the presence of other genes within the ciuster , a

38

Page 57: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

threshold classically used in gene farnily studies (Niimura and Nei 2003). Clusters were

determined for each hurnan chromosome.

Identification of mammalian C2H2-ZNF ciusters syntenically homologous to human

clusters

We searched for clusters hornologous to the human C2H2-ZNf clusters (i.e.

syntenically homologous clusters) in other mammals for which complete genorne

sequences are available. The assemblies used for Pan troglodytes), Mus muscutus, Rattus

norvegicus and Canis famitiaris were chimpanzee Pan Tro- 2.1, mouse NCBI m36, rat

RGSC 3.4 and dog Can fam 2.0. We used the linkage maps of Ensembi

(http://www.ensembl.org); assignment of syntenic clusters is based on the genes flanking

each human cluster and which were mapped in ail the species. Four flanking genes at each

extremity were mapped in most instances. Then, we conducted TBLASTN analysis of the

syntenic regions comprised between the flanking genes, using as queries the amino acid

sequence of the zinc finger region from ail the known human C2H2-ZNF genes from the

corresponding region. A hit with e-value le-4 confirrned the respective hornologous

clusters in the five mammalian genornes. A comprehensive catalogue of the hurnan C2H2-

ZNf clusters and their syntenic counterparts in other mammals is cornpiled in

Supplementary Table S4.

39

Page 58: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Phylogenetic analysis

Phylogenetic analysis was conducted using the amino acid sequences of the zinc

finger region (identified using PROSITE) of C2H2-ZNF genes from selected human

clusters and their syntenically homologous clusters in chimpanzee, mouse, rat and dog. A

multiple sequence alignment of the zinc finger regions of the C2H2-ZNF genes was

generated using the program MUSCLE (Edgar 2004). The alignments were edited to

remove gaps using the program GBLOCKS (Castresana 2000). Maximum Likelihood (ML)

and Bayesian Inference (BI) methods were used to infer the phylogenetic trees and estimate

the clade support. for ML analysis, the program RAxML (RAxML-VI-HPC Version 2.2.1)

(Stamatakis, Ludwig et al. 2005) employing the WAG model of amino acid substitution

was used to reconstruct the best tree. Bootstrapping of 100 datasets was irnplemented. The

posterior probabilities were deterrnined by a Bayesian MCMC method implemented in the

program Mr.Bayes v.3. 1 (Huelsenbeck and Ronquist 2001) to test the robustness of the

topology of the tree infened through ML. One million generations were rcin and the trees

were sampled after every 10 generations.

To determine appropriate outgroups for our analysis, we searched the nr database to look

for close homologs in non-mammals using TBLASTN (e-value eut off le-4). In addition to

the Xfin sequence from Xen opus Ïaevis, we obtained a set of zinc finger genes from

Chicken (Gallus gaïlus, Assembly WASHUC2) specifically selected for each human

C2H2-ZNF cluster based on an extensive similarity search. To select the chicken outgroup,

a TBLASTN (e-value cutoff le-4) search was done against the chicken genome using each

ofthe human C2H2-ZNF sequences derived from the selected cluster of interest as a query.

40

Page 59: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

The top 10 hits for each query sequence were ail analysed using a CD-HIT anaiysis

(Identity threshold = 100%, 95% and 90%) (Li, Jaroszewski et aI. 2001) to produce a final

set of non-redundant representative chicken sequences, ail used as a part ofthe outgroups.

Sequence analysis to confirm the Ioss of domains

In the case where loss of a dornain was suspected, we conducted an extensive

sequence analysis to mie out the possibility that these domains would have been rnissed

either due to a frame-shift or inadequate exon-intron spiicing of the gene and thus

inappropriate amino acid translation, preventing recognition by PROSITE

(http://www.expasy.com). Firstiy, for each particular C2H2-ZNF genes where loss of an

N-tennina1 dornain was suggested, we systematically collected the nucleotide sequence of

the region ranging from the stop of translation of the previous gene to the start of

translation of the next gene. We conducted a TBLASTN search of this region using die

amino acid sequence of the dornain of interest (present in the colTesponding orthoiogs and

the consensus of the domains selected from randomly seiected sequences) as a query to

confirm the absence of the domain in the C2H2-ZNF gene of interest. Secondly, we

obtained the exon-intron structure of these genes using the Ensernbl Genorne Browser

(http://www.ensembi.org). In order to search for exonic 01. intronic sequences which may

exhibit significant identity with the nucleotide sequence of the domain of interest. For this

purpose we conducted a BLAST anaiysis of the individual exon and intron sequences with

the nucleotide sequence of the various domains that are present in the coiresponding

orthoiogs.

41

Page 60: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Flowchart of the study

figure 1 summarizes the flowchart of our analysis procedure of C2H2-ZNF genes and

clusters in mammals.

42

Page 61: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

RESULTS

Compilation of a comprehensive catalogue of human C2H2-ZNF genes

Previous studies reported the existence of at least 564 C2H2-ZNF genes in the

human genome and suggested that this family may include approximately 700-800 genes

(Bellefroid, Lecocq et al. 1989; Bellefroid, Poncelet et al. 1991). As a first step to study the

evolution of C2H2-ZNF genes, we established a comprehensive catalogue of the C2H2-

ZNf genes in the hurnan genome. By conducting an extensive simiiarity search (see

Methods), we identified 71$ C2H2- ZNF genes (compiied in Supplementary Table Si). 0f

the 718 genes, 66 are annotated as pseudogenes in Genflank. For ail genes, we determined

their exact position on the chromosomes, their orientation, the number of finger motifs and

the effector domains.

These genes are distributed across ail chromosomes of the human genorne

(Supplementaiy Table $2). As reported earlier, chromosome 19 has the highest number

(Venter, Adams et al. 2001) and density ofC2H2-ZNF genes, inciuding 40% (289) ofthe

71$ human C2H2-ZNF genes, whereas this chromosome corresponds to only 2.1 % of the

human genome. More than haÏf (58%) of the C2H2-ZNf genes encode conserved N

tenuinal domains, the KRAB, SCAN and BTB dornains (figure 2A). typically involved in

transcriptional regulation (Kim, Chen et al. 1996; Collins, Stone et al. 2001) and form

different C2H2-ZNF subfamilies. Further, we discovered two additïonal dornains typical of

transcription regulators, the SET and HOMEO domains that are also encoded by C2H2-

ZNf genes. While the KRAB subfarnily represents almost haif of the C2H2-ZNF genes

(45%), SET and HOMEO C2H2-ZNF genes together with members of ail the other

43

Page 62: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

subfamilies account for oniy a small percentage (—12%) of the C2H2-ZNF genes (figure

2A).

Clustered organization of human C2112-ZNf genes

It was reported earlier that chromosome 19 is particularly rich in tandemly

duplicated C2H2-ZNf genes and that KRAB C2H2-ZNF genes. are clustered on several

other chromosomes (Dehal, Predki et al. 2001; Rousseau-Merck, Koczan et al. 2002). In

order to trace the duplication history of the entire C2H2-ZNF repertoire, we studied the

distribution of these genes across the whole human genome. Two consecutive C2H2-ZNf

genes were considered to belong to a cluster if the distance between them is 500 Kb,

regardless of the presence of other genes or pseudogenes within the cluster (see Methods).

Using this definition, we identified 81 human C2H2-ZNF clusters accounting for 72 % of

the total number ofC2H2-ZNF genes (518 of the 718) (Supplementary Tables S2 and S3).

The rernaining genes are dispersed as singletons. Among these clusters, 3 1 ¾ include

exclusively tandemly organized C2H2-ZNF genes with no other intervening genes (figure

2B, Supplementary Table S3). The number of C2H2-ZNF genes per cluster ranges from 2

to 76 with an average of 6. As illustrated in the Figure 2B, about 75 % ofthe total number

of C2H2-ZNf clusters has between two to six genes. Consistent with previous reports,

chromosome 19 flot only has the Iargest number of C2H2-ZNF clusters (Supplernentaiy

Table S2) but also hosts the largest clusters (>12 genes) (see figure 2B and

Supplementary Table S3).

44

Page 63: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

We find that the large majority of KRAB (89 %) and SCAN (90 %) types of C2H2-ZNF

genes are arranged in clusters (Figttre 2A and Supplementary Table S2). This contrasts with

the BTB subfarnily of C2H2-ZNF genes or those lacking regulatory domains which occur

more offen as singletons in the genome. An analysis of the composition of individual

clusters revealed that two-third of the clusters contains a mixture of various C2H2-ZNF

subfamilies (‘mixed clusters’, Supplementaiy Table S3). The few ciusters made up of a

single C2H2-ZNF gene subfamily (‘pure ciusters’) are ofsmall size (<4 genes).

Identification and comparison of syntenic C2H2-ZNF clusters across mammals

With the ultimate goal to study the evolution of zinc finger genes, we identified and

compiied clusters in completely sequenced mammalian genomes (i.e. chimpanzee, mouse,

rat and dog) that are syntenically homologous to those of hurnan. SyntenicaÏly homologous

clusters were identified by the genes flanking each ciuster. Then, ail the C2H2-ZNF genes

found within the delimited syntenic regions were identified using a TBLASTN search (sec

Methods). The $1 human C2H2-ZNF clusters and their syntenic counterparts in other

mammais are listed in Supplementaiy Table 54, which also inciudes information on the

orientation of the genes in the clusters, their associated domains, the number of zinc finger

motifs and the flanking genes.

Primates (Homo sapiens and Pan troglodytes) stood out for their large number of

both C2H2-ZNF clusters and genes within them, as compared to rodents (Mts musculus

45

Page 64: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

and Rattus norvegicus) and Canis fiunitiaris (Figure 3A). The most parsimonious

explanation is that a large expansion of C2H2-ZNF genes occurred in primates, and more

particularly in human (51$ genes in human versus 397 in chirnpanzee) after divergence

from rodents and canines. Rat lias siightly less C2H2-ZNF genes than dog (7%), but 25%

less than mouse. Considering the evolutionary relationship of the species (Figure 3A), these

data suggest that flot oniy species-specific duplication events, as reported earlier (Dehal,

Predki et al. 2001; Hamilton, Huntiey et al. 2003; Shannon, Hamilton et al. 2003), but aiso

loss of family members (suggested here in rodents) may have occurred during the evolution

of mammals. Differential species-specific expansion was reported previously for a subset of

genes from the human ZNF45 subfamily on chromosome 19 compared with its mouse

counterpart (Shannon, Harnilton et al. 2003). Furthermore, expansion of the human KRAB

C2H2-ZNF subfamily was also shown earlier based on draft versions of the genornes of

chimpanzee, mouse and dog (Huntley, Baggott et ai. 2006). However, evidence of C2H2-

ZNF gene or ciuster ioss couid not be definitively obtained in these studies as it required

detaiied analysis of more than two compieteiy sequenced genomes.

Comparing individual syntenic clusters in the mammalian genomes

To distinguish whether differences in the number of C2H2-ZNF clusters are due to

species-specific gene gain or ioss, we systematicaily compared individual syntenic clusters

in the five mammalian genomes studied. The resuits of this analysis point to a differential

evolutionaiy history in mammals. About 60 ¾ ofthe human clusters (49) have syntenicaliy

homologous counterparts in ail the species studied indicating that these C2H2-ZNf clusters

46

Page 65: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

predate the divergence of dog, rodents and primates (Supplementary Table S4 and

$upplementary Figure S2). In addition, we found (j) primate specific clusters (14 including

2 human specific clusters), (ii) clusters, present in primates and dog, that were lost in

rodents (8 ciusters including 3 present in mouse but absent in rat) and (iii) clusters present

in primates and rodents but absent in dog (10 clusters) (examples in Figure 3B). Essentially

ail the primate clusters have iarger number of genes than rodent or dog clusters which

reflects a global primate-specific expansion of C2H2-ZNF (Supplementary Figure S2).

Further, in 40% of ail primate clusters, those from human contain more C2H2-ZNF genes

than those from chimpanzee. This indicates that most of the evolutionaiy changes

(duplication and/or loss) occurred late in the primate branch. A similar patteni was seen in

rodents, where almost ail mouse C2H2-ZNF clusters exhibit more genes than their syntenic

rat clusters. While these resuits illustrate that the C2H2-ZNF gene family is rapidly and

independentïy evolving within different Ïineages, insights into the role of gene duplication

and loss in the histoiy of this gene family required rigorous phylogenetic analysis.

Phylogeny of C2H2-ZNF clusters in mammalian genomes

For addressing the relative contribution of gene duplication and loss in the evolution

of C2H2-ZNF genes in mammals, we focused our smdy on selected large human C2H2-

ZNf clusters and their syntenic counterparts in four other mammals. We expected that

larger ciusters would be more informative and possibly more representative of the whoie

genomes. Because of the clarity of evolutionary scenarios observed in the tree, we present

here a detailed phylogenetic analysis of the second largest hurnan C2H2-ZNF cluster (43

47

Page 66: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

genes) located on chromosome 19q13.4, that we narned cluster 19.12, and of its syntenic

clusters (Supplementary Table $3 and S4) in other species. For phylogenetic analysis, we

used the predicted amino acid sequences of the zinc finger regions. Genes annotated as

pseudogenes in Genbank or genes containing less than three zinc finger motifs were not

considered in the phylogenetic analysis (noise is expected to be too high if sequences of

only 56 amino acids corresponding to 2 fingers motifs or less were included). Our total

data set of C2H2-ZNF sequences from the hurnan cluster 19.12 and their syntenic

homologs in chimpanzee, mouse, rat and dog consists of 101 protein sequences, including

the outgroup sequences from Xen opus and Chicken. We constructed a phylogenetic tree

using Maximum Likelihood and Rayesian methods. We subdivided the tree into three

groups (figure 5) based on the kind of evolutionaiy scenarios observed i.e. one-to-one and

one-to-many orthologous relationships between genes as weIl as gene loss as defined in

f igure 4. The number of C2H2-ZNF sequences from each species is highlïghted for each

group. Two of these groups are monophyletic with significant (95%) support in both the

Maximum Likelihood and Bayesian analysis (Group I and III).

A detailed analysis of the tree revealed four clades that underwent species-specific

expansion, and two clades, with gene loss in some species. For example, a dog-specific

expansion is seen in the monophyletic Group I, which includes three clustering genes from

human (hZNF331), chimpanzee (pZNF331) and dog (cZNF33Y) which in tum grouped

within a larger clade containing fine additional C2H2-ZNF genes from dog. In addition,

this clade indicates a loss in rodents, due to the absence of mouse or rat genes. Group I

alone illustrates how both species-specific dtiplication in dog and loss in rodents can

48

Page 67: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

account for the higher number of genes seen in dog C2H2-ZNf clusters as cornpared to

rodents.

Group II shows more pronounced expansion in hurnan as seen in several clusters.

for example, one of the primate-specific clades includes 17 human genes and 7 chimpanzee

C2H2-ZNF genes (f igtire 5). 0f the 17 hurnan genes present in the clade, only 6 genes

show a one-to-one orthologous pairing with chimpanzee genes. Another well supported

clade includes a single human gene (hZNf677) clustered with two dog genes (L0C48433 1

AND L0C476394). In this clade, the absence of a chimpanzee or rodent counterparts to

these three genes suggests a loss in these species. for chimpanzee, however, loss by

pseudogenization is possibly involved (see physical maps described below); note that the

percentage of C2H2-ZNf genes annotated as pseudogenes was higher in chimpanzee that in

human C2H2-ZNf clusters (62, Supplernentary Table S4).

In group III, the relationship cf the four rodent genes with the dog and primate genes

could flot be resolved (bootstrap values < 95 ¾). However, a rodent-specific clade revealed

a mouse-specific duplication exhibiting a higher number cf C2H2-ZNF genes in mouse

than rat, as seen in several other cases in our study.

Superimposition of the phylogenetic trees with the physical maps of clusters

Comparison of gene trees, species tree and physical rnap infomiation cf cluster

19.12 genes and its syntenic homclogs provide better insights into the processes underlying

the evolution cf the C2H2-ZNF clusters. The phylogenetic tree obtained for cluster 19.12

(Figure 5) suggests a simultaneous differential expansion and loss of C2H2-ZNF genes

49

Page 68: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

throughout evolution. In perfect agreement with the phylogenetic tree, genes of the

monophyletic groups I and III were found to be physically clustered together on the

chromosomes across mammals (Figure 6). Evidence for a tandem duplication event is

provided by the comparison of the relationship within C2H2-ZNF genes of Group I on the

tree with their spatial relationships in the physical maps that showed that the sequences of

the dog clade form a tandem array on the chromosome (Figure 5). In addition to tandem

duplication of individual genes within this group, e.g. cLOC484324 and cLOC484323

(figure 5) which are next to each other on the chromosome and exhibit the sarne orientation

(figure 6), we also discovered tandem duplication of multiple genes. for instance, three

genes (L0C482273, LOC6 11599, L0C480782/ orientation -, +, +) appear as a tandem

repeat ofthree other genes (LOC61 1583, L0C484328, L0C484326/ orientation -, +, +) in

this group (figures 5 and 6).

The group II mainly contains primate-specific C2H2-ZNF genes that cluster on the

phylogenetic tree in two well supported clades ( 97 % bootstrap) and a sub-group of

weaker support (93% bootstrap). Aimost ail these genes also cluster physicaily together on

the chromosome. Human orthoÏogy assignrnents for ten of the twelve chimpanzee genes

from group II (underlined in Figure 6) were corroborated by two lines of evidence i.e. from

the phylogeny, which was supported by the topology on the chromosome. furthermore,

genes from 7 out of 10 of the C2H2-ZNF ortholog pairs from this primate-specific cluster

exhibit the same number of zinc finger motifs and the same type of N-terminal motif

50

Page 69: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Species-specific variation in the number of finger motifs and the presence of N-

terminal conserved domains

When analysing the C2H2-ZNF genes from the 81 human clusters and their

syntenic homologs in mammals, we noticed that the average number of zinc finger motifs

varied depending on the C2H2-ZNF gene subfamiiies. Noticeably in ail the marnmaiian

species studied, genes with KRÀB and SCAN-KRAB motifs have a higher number of zinc

finger motifs tlian those from the other subfamiiies (Figure 7A). for exampÏe, member of

the KRAB subfamily have an average of 10 to 17 zinc finger motifs, whule members ofthe

BTB subfamily have oniy 2 to 3 (figure 7A). We also noted species-specific variation in

the number of zinc finger motifs within mammalian C2H2-ZNF genes. In particuiar, dog

tends to have a much higher number of zinc finger motifs in rnost C2H2-ZNF gene sub

families (Figure 7). Strikingly, L0C484264, a dog KRAB C2H2-ZNF gene exhibits 70 zinc

finger motifs which is to our knowledge the highest number of zinc finger motifs to be

reported for a zinc finger gene. Study of cluster 19.12 (Figure 5) iliustrates more

specificaliy the trend ofdog genes to exhibit more zinc finger motifs; the dog L0C484338

gene (group III), for example, lias six times more zinc finger motifs than its human

ortholog. Furtliermore, the dog gene L0C424326 lias neariy twice as many motifs as its

closest paralog L0C480782 (group I) (Figure 5). This indicates a quite recent and drastic

expansion of zinc finger motifs within dog C2H2-ZNF genes, after tlie separation of dog

from rodents and primates. In several cases, the C2H2-ZNf mammalian orthologs revealed

differences in tlieir numbers of finger motifs even within primate or within rodent lineages

(Figure 7B and Supplernentary Table 4).

51

Page 70: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

In addition to the difference in the number of finger motifs in C2H2-ZNF

orthologs and paralogs, we also found a variation in the presence of the N-terminal effector

domains. As an example, orthologs ofthe C2H2-ZNF genes in the human cluster 6.2 show

a variation in the presence of the KRAB or SCAN domains (Figure 73), suggesting

frequent and multiple losses and/or gains of KRAB and SCAN domains during evolution.

To reconstntct these events, we analyzed in detail the exon-intron structure and sequences

ofthese genes (See Methods). Serendipitously, this analysis led us to the observation that a

large majority of the C2H2-ZNF containing a SCAN-KRAB or SCAN domain had each a

typical exon-intron organization (figure 7C). For example, both human genes, ZNF[92

(SCAN-KRAB) and ZNF187 (SCAN) and their respective orthologs in other species

(Figure 63) share the predominant exon-intron organization most typical of SCAN-KRA3

and SCAN C2H2-ZNF, respectively. Whule the dog LOC4$83 1$ bas only a SCAN dornain.

its coriesponding orthologs in human, mouse and rat have a SCAN-KRAB. When the

nucleotide sequence of the exon which would have been predicted to encode a KRAB

domain in dog (third exon after the SCAN) was compared with those ofhuman, mouse and

rat, the dog sequence exhibits a high conservation at the nucleotide level (>82 %) but no

significant similarity at the arnino acid ÏeveÏ. This indicates that the loss of the KRAB

domain in dog was due to sequence degeneration. Similarly, while the chimpanzee

ZNFÏ87 and its rat orthoÏog encode a SCAN dornain, a degenerate SCAN domain was

identified in the corresponding exon of their hurnan and mouse orthologs. For the human

SCAN-KRAB ZNF3O7 gene, we noticed that it exhibits an exon-intron organization typical

of SCAN-KRAB C2H2-ZNF (Figure 7C) whereas its orthologs in the chimpanzee, mouse

52

Page 71: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

and rat encode solely a SCAN domain and present an exon-intron structure more typical

of SCAN C2H2-ZNf. However, it was found that, in chimpanzee, a sequence similar to the

KRAB sequence (99% at the nucleotide level) was embedded in the intron preceding the

exon encoding the zinc finger domain. No KRAB related sequence could be detected in the

rodent orthologs even with a detailed analysis of their sequences. Thus, either the KRAB

sequence was gain in the primate lineage or lost in the rodent lineage. For reasons

explained in the discussion, we believe that loss, rather than gain, is a more likely

hypothesis.

53

Page 72: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DISCUSSION

Comparative studies in genome research focused on the extensive similarities

existing between the human genome and the genomes from various other model organisms

which provide valuable insights into biological function and aetiology of human diseases.

However, differences existing among genomes have received less attention inspite of the

importance they may have in the physiological, morphological and behavioural distinctive

traits observed among species. A few studies on various gene families, such as the odorant

receptor family, pointed out to some differences existing between genes of closely rehited

species (Sitnikova and Su 1998; Lapidot, Pupe! et al. 2001; Niimura and Nei 2003; Gilad,

Man et al. 2005; Niirnura and Nei 2005). Our study ofthe C2H2-ZNf gene family reveals

that there is an extensive variation of the C2H2-ZNF gene content and organization in the

genornes from various mammals as well as in the domain composition of orthologous genes

arnong species. It also provides the first clear demonstration of the contribution of gene

loss in the C2H2-ZNF family during evolution which occurs at ifie level of clusters, genes

and their ftinctional dornains. We provide the first genome scale confirmation of the rapid

evolution of C2H2-ZNF gene clusters that occurs independently within related species

which also supports conclusions drawn from smaller-scale studies on individual genes,

clusters and C2H2-ZNF subfamilies (Dehal, Predki et al. 2001; Shannon, Harnilton et al.

2003; Harnilton, Huntley et al. 2006; Huntley, Baggott et al. 2006).

54

Page 73: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Substantial variation in the C2112-ZNF gene family size and clustering across

mammals

We report here the first complete catalogue of ail human C2H2-ZNF gene clusters and their

syntenic homologs in chimpanzee, mouse rat and dog. This catalogue reveals that in

hurnan, a large proportion of the genes from the C2H2-ZNF family (>70%) are organized

in clusters. Comparative studies of the five mammallan genomes indicated that the total

number of genes found in clusters varied considerably from 172 in rat to 518 in human

(number of genes found in clusters in human > chimpanzee > mouse > dog > rat).

Significantly, human and mouse have a larger number of clustered C2H2-ZNF genes

(>30%) as compared to chimpanzee and rat, respectively, indicating that independent

evolutionary events occurred after the divergence of the two primates (within the last 6-

10 million years) and two rodents (within 30-46 million years). We distinguish two kinds

of events: first, a variation in the niimber of C2H2-ZNF genes in syntenically hornologous

clusters and second, the existence of lineage- and species-specific clusters in primates,

rodents and canines. This can be accounted for by independent evolution of C2H2-ZNF

genes in these closely related species. Previous studies focusing on KRAB C2H2-ZNF

from chromosome 19 had identified and analyzed a primate-specific citister (Mohrenweiser

1998) including members of the primate-specific ZNF9I subfamily of C2H2-ZNf

(HamiÏton, Huntley et aÏ. 2006). Other studies on the KRAB C2H2-ZNf subfarnily

identified a differential expansion between a human KRAB C2H2-ZNF cluster and its

syntenic counterpart in mouse (Shannon, Hamilton et al. 2003) and more recently other

species-specific expansions based on draft of various mammalian genornes (Huntley,

55

Page 74: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Baggott et al. 2006). We illustrate and confïrrn at a larger scale the existence of an on

going process of genome dynamics with several lineage- and species-specific

rearrangements and continuous repertoire expansion taking place independently in ail

evolutionaiy branches, particularly in primates. This finding was only possible through the

analysis of a complete catalogue of ail the subfarnilies of C2H2-ZNf clusters and their

syntenic counterparts in mammals.

Gene duplication and loss: Two counteracting forces in the evolution of C2H2-ZNF

genes

An overview of $1 human C2H2-ZNF ciusters identified here revealed that a third

ofthern are pure clusters (with 2 to 24 C2H2-ZNF genes), i.e. they are flot interspersed with

other genes. Earlier observations of pure C2H2-ZNF gene ciusters have Ied to the

hypothesis that C2H2-ZNF genes in primates have expanded massively by tandem

duplication (Belleftoid, Marine et al. 1993; Eichler, Hoffrnan et al. 1998; Elemento and

Gascuel 2002; Schmidt and Durrett 2004; Bertrand and Gascuel 2005; Huntley, Baggott et

al. 2006). We revisited this question based on our catalogue of human C2H2-ZNF clusters

and their syntenic counterparts in chimpanzee, mouse, rat and dog. Here, we conflrmed

gene duplication and loss based on a reconciliation of both physicaÏ maps and the

surperimposition of gene trees onto the known species tree (Page and Charleston 1 997).

Our resuits clearly show that both gene gain and gene loss events have occurred multiple

times and independently in ail the mammals studied. Combined with physical map data, our

phylogenetic studies indicate that the expansion of C2H2-ZNF genes evidenced during the

56

Page 75: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

evolution of the five species studied results from the combined action of single-gene

duplication and multiple gene duplication (for instances, duplication of ah or part of the

genes within a cluster). These duplication events were however counteracted by the loss of

individual genes or clusters as exemplified in several cases where related genes or clusters

found in primates and canine were absent in both or in any of the two rodents studied. This

study represents the first clear demonstration of the involvement of gene and cluster loss in

the evolution of C2H2-ZNF genes and suggests that during mammalian evolution the

duplication events outnumbered the loss events. Our resuits provide convincing support

that the C2H2-ZNF gene farnily evolved according to the “Birth and DeathT’ model as

proposed by Nei and colleagues (Nei, Gu et al. 1997; Nei 2000). According to this model,

new genes are created by duplication incÏuding tandem duplication and block gene

duplication (birth). While certain copies remain relatively unchanged in the genome for a

long tirne, others diverge functionahly by acquiring a new function. Sorne get deleted from

the genome or becorne pseudogenes following deleterious mutations (death through

ehimination or inactivation). In the case of C2H2-ZNF genes pseudogenization seems to be

limited, as suggested by expression studies and statistical analysis showing positive

seÏection based on the analysis of specific clusters (Schmidt and DmTett 2004; Huntley,

Baggott et al. 2006). This makes the C2H2-ZNf famlly different from the other gene

farnilies such as the olfactory receptor gene famiÏy (Glusman, Yanai et aÏ. 2001; Niimura

and Nei 2003) . Noticeably, gene loss by pseudogenization was prominent for the

olfactory receptors with humans accumulating a higher number of olfactoiy receptor

pseudogenes as compared to other primates and mouse (Sitnikova and Su 199$; Lapidot,

57

Page 76: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Pilpel et al. 2001; Niimura and Nei 2005). These variations in the numbers ofpseudogenes

and fiinctional genes have been associated with the differential chemosensoiy dependence

in these species (Sharon, Glusman et al. 1999; Quignon, Kirkness et al. 2003). In

comparison, besides the fact that they are known to fiinction as regulator of transcription,

the functions of only a few C2H2-ZNF proteins are known (Krebs, Larkins et al. 2003).

Further studies of C2H2-ZNf genes in mammals could shed light on the functional

consequences of different repertoires of these genes in different species. Until now, the

clustered organization of these genes lias made knock-out studies in animal modeis

inefficient, possibly due to redundancy. However, based on a better knowledge of the

organizationlcontent of C2H2-ZNF genes in the various genornes, large chromosornal

deletions of pure C2H2-ZNf clusters or other types of gene disruption or targeting

approaches could provide insights into the functions of these genes in different animal

models.

Evolution of C2H2-ZNF genes through gain and loss of finger motifs and N-terminal

effector domains

Evidence ofthe variation in the numbers of zinc finger motifs among orthologs was

previously reported for a subset of hurnan chromosome 19 C2H2-ZNf genes and their

mouse homologs (Looman, Abrink et al. 2002). It was shown that this variation is due to

both differential duplication of finger motifs and loss due to degeneration. In our study,

such variation in the number of zinc finger motifs arnong orthologs was observed

recurrently among ail mammals. Since the zinc finger motifs appears as a flexible motif

58

Page 77: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

with the ability to bind DNA, RNA and/or proteins, changes in the zinc finger motif

sequences and number within C2H2-ZNF genes could differentially alter binding

specificities and thus protein function. Both changes in the number of C2H2-ZNF genes

and in the number of finger motifs encoded by orthologous genes may be determinant in

species-specific related ftmction.

The rapid evolution of the C2H2-ZNF genes observed in the mammalian lineage

was flot limited to the variation in the number of genes and zinc finger motifs. Variation in

the presence of N-terminal effector domains, such as SCAN or KRAB, was observed in

orthologs and could be accounted for by either gain or loss of these motifs in the various

species. Loss by sequence degeneration of both the SCAN and the KRAB sequences vas

confirmed in several cases in our study. In some cases, neither loss nor gain could be

resolved. A puzzling question remains however if one assumes that gain of KRAB and

SCAN sequences can occur recurrently within C2H2-ZNF genes. It is indeed diffictiit to

explain that these effector domains are always found in association with and N-terminal to

the zinc finger motifs of C2H2-ZNF proteins and that the SCAN dornain is aiways

positioned N-terminal to the KRAB domain of SCAN-KRAB C2H2-ZNF proteins.

Interestingly, by analyzing the exon-intron structure of C2H2-ZNF genes from the clusters,

we found that most SCAN C2H2-ZNf and SCAN-KRAB C2H2-ZNF genes have each a

typical exon-intron structure (Figure 7C and Figure 8A). This suggests that the acquisition

of a SCAN and KRAB sequences by C2H2-ZNF genes corresponds most likely to singular

events. This led us to propose the moUd described in Figure 8. Considering that the SCAN

domain is found in ah vertebrates and is more ancient than the KRAB dornain only found

59

Page 78: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

in tetrapods, we suggest that first a SCAN-C2H2 ZNF gene was formed in an ancestor of

vertebrates through the gain of SCAN sequence and that later, after the emergence of the

tetrapods, the gain of a KRAB sequence by a SCAN C2H2-ZNF gene gave rise to a SCAN

KRAB C2H2-ZNF gene (Figure 83). These two gain events possibly occurred through an

exon-shuffling mechanism. Diversification of the C2H2-ZNF repertoires from each

subfamily then occuned dynamicaliy through on-going gene duplications and loss by

deletion or degeneration of the SCAN and!or KRAB sequences. As irnplied by this model,

the birth of the SCAN-KRAB C2H2-ZNF subfamily occurred earlier than that of the

KRAB C2H2-ZNF subfamiiy. This was consistent with previous data showing that SCAN

KRAB-ZNf genes do flot group together on one evolutionary ciade but intermix with

KRAB-ZNF genes in phylogenetic trees of the KRAB sequence (Looman, Abrink et al.

2002; Huntley, Baggott et al. 2006) (Huntiey, Baggott et al. 2006). On the whole, our

model is in agreement with the fact that C2H2-ZNF orthologs often belong to different

C2H2-ZNF subfamilies and that we observed intermingling of C2H2-ZNf genes from the

SCAN, KRAB and SCAN-KRAB subfamiiies in many C2H2-ZNF clusters. Our study

clearly indicated that the evolution of C2H2-ZNf subfamilies is tightly iinked and stresses

that the assignrnent of proper orthology requires comprehensive analysis of ail C2H2-ZNF

genes rather than the individual analysis of specific C2H2-ZNF subfamilies. It also points

to the importance of ioss/contraction and secondary simplification whose role in the

dynamics of evolution is ofien underestirnated. The underiying rnechanisms in the

expansion of C2H2-ZNF genes and the flinctional consequences of the important changes

(gain and loss) occurring in their repertoires of various mammals are unclear. These

60

Page 79: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

variations, for example, may be at the advantage of complex organisrns by providing more

subtie and species-specific control in gene expression for morphogenesis or cognitive

functions.

ACKNOWLEDGEMENTS

The authors thank Franz B. Lang for suggestions and help in phylogenetic analysis and

Nicolas Lartillot (Universite de Montreal) for constructive comments on tree building and

datasets. We also acknowledge Herve Philippe, Henner Brinkmann and Amy Hauth for

critical discussions and Allan $un for assistance in hardware and software installations.

This work was supported by a grant from Natural Sciences and Engineering Research

Council of Canada (to M.A). M.A is a Chercheur National from the Fonds de la recherché

Sante du Quebec (FRSQ) and G.B is an associate from CIFAR (Canadian Institute for

Advanced Research).

61

Page 80: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

REFERENCES

Bellefroid, E. J., P. J. Lecocq, et al. (1989). “The hurnan genome contains hundreds of

genes coding for finger proteins ofthe Kruppel type.” DNA 8(6): 377-87.

Bellefroid, E. J., J. C. Marine, et al. (1993). “Clustered organization of homologous KRÀB

zinc-finger genes with enhanced expression in hurnan T lymphoid ceils.” Embo J

12(4): 1363-74.

Bellefroid, E. J., D. A. Poncelet, et al. (1991). “The evolutionarily conserved Kruppel

associated box domain defines a subfarnily ofeukaiyotic rnultifingered proteins.”

Proc Nati Acad Sci U S A 88(9): 3608-12.

Bertrand, D. and O. Gascuel (2005). “Topological reanangements and local searci rnethod

for tandem duplication trees.” IEEE/ACM Trans Comput Biol Bioinform 2(1): 15-

28.

Castresana, J. (2000). “Selection of conserved blocks from multiple alignments for their use

in phylogenetic analysis.” Mol Biol Evol 17(4): 540-52.

Coilins, T., J. R. Stone, et ai. (2001). “Ail in the family: the BTB/POZ, KRAB, and SCAN

domains.” Mol Ceil Biol 21(11): 3609-15.

Dehal, P., P. Predki, et al. (2001). “Human chromosome 19 and related regions in mouse:

conservative and lineage-specific evolution.” Science 293(5527): 104-11.

Edgar, R. C. (2004). “MUSCLE: multiple sequence alignment with high accuracy and high

throughput.” Nucleic Acids Res 32(5): 1792-7.

Eichler, E. E., S. M. Hoffman, et al. (1998). “Complex beta-satellite repeat structures and

the expansion of the zinc linger gene cluster in l9pl2.” Genome Res 8(8): 791-808.

62

Page 81: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Elemento, O. and O. Gascuel (2002). “An efficient and accurate distance based algorithm

to reconstruct tandem duplication trees.” Bioinformatics 1$ Suppl 2: $92-9.

Gertz, E. M., Y. K. Yu, et al. (2006). “Composition-based statistics and translated

nucleotide searches: improving the TBLASTN module ofBLAST.” BMC Biol 4:

41.

Gilad, Y., O. Man, et al. (2005). “A comparison ofthe human and chirnpanzee olfactory

receptor gene repertoires.” Genome Res 15(2): 224-30.

Glusman, G., I. Yanai, et al. (2001). “The complete human olfactory subgenorne.” Genome

Res 11(5): 685-702.

Grondin, B., M. Bazinet, et al. (1996). “The KRAB zinc finger gene ZNf74 encodes an

RNA-binding protein tightly associated with the nuclear matrix.” J Biol Chem

271(26): 15458-67.

Hamilton, A. T., S. Huntley, et al. (2003). “Lineage-specific expansion ofKRAB zinc

finger transcription factor genes: implications for the evolution of vertebrate

regulatory networks.” Cold Spring Harb Symp Quant Biol 68: 13 1-40.

Hamilton, A. T., S. Huntley, et al. (2006). “Evolutionaiy expansion and divergence in the

ZNf91 subfamily ofprimate-specific zinc finger genes.” Genome Res 16(5): 584-

94.

Huelsenbeck, J. P. and F. Ronquist (2001). “MRBAYES: Bayesian inference of

phylogenetic trees.” Bioinfomatics 17(8): 754-5.

63

Page 82: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Huntley, S., D. M. Baggott, et al. (2006). “A comprehensive catalog ofhurnan KRAB

associated zinc finger genes: insights into the evolutionaiy history of a large farnily

of transcriptional repressors.” Genome Res 16(5): 669-77.

Kim, S. S., Y. M. Chen, et al. (1996). “A novel member ofthe RIISTG finger fami!y, KRTP

1, associates with the KRAB-A transcriptional repressor domain of zinc finger

proteins.” Proc Nati Acad Sci U S A 93(26): 15299-304.

Klug, A. and D. Rhodes (1987). “Zinc fingers: a novel protein fold for nucleic acid

recognition.” Cold Spring Harb Symp Quant Bio! 52: 473-$2.

Krebs, C. J., L. K. Larkins, et a!. (2003). “Regulator of sex-limitation (Rsl) encodes a pair

of KRAB zinc-finger genes that control sexually dimorphic liver gene expression.”

Genes Dcv 17(2 1): 2664-74.

Lander, E. S., L. M. Linton, et a!. (2001), “Initial sequencing and analysis ofthe hurnan

genorne.” Nature 409(6822): 860-921.

Lapidot, M., Y. Pupe!, et al. (2001). “Mouse-human orthoÏogy relationships in an olfactory

receptor gene cluster.” Genornics 71(3): 296-306.

Lee, M. S., G. P. Gippert, et al. (1989). “Three-dimensional solution stnicture ofa single

zinc linger DNA-binding domain.” Science 245(49 1$): 635-7.

Li, W., L. Jaroszewski, et al. (2001). “Clustering of highly hornologous sequences to reduce

the size of large protein databases.” Bioinforrnatics 17(3): 282-3.

Looman, C., M. Abrink, et al. (2002). “KRAB zinc linger proteins: an analysis ofthe

molecular mechanisms goveming their increase in numbers and cornplexity during

evolution.” Mol Biol Evol 19(12): 2118-30.

64

Page 83: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Looman, C., L. Helirnan, et al. (2004). “A novel Kruppel-Associated Box identified in a

panel ofmamrnalian zinc fingerproteins.” Mamrn Genorne 15(1): 35-40.

Mark, C., M. Abrink, et al. (1999). “Comparative analysis ofKRAB zinc linger proteins in

rodents and man: evidence for several evolutionarily distinct subfamilies ofKRAB

zinc linger genes.” DNA Ceil Biol 18(5): 381-96.

Melnick, A., G. Carlile, et aI. (2002). “Criticat residues within the BT3 dornain ofPLZF

and Bel-6 modulate interaction with corepressors.” Mol Celi Biol 22(6): 1804-18.

Messina, D. N., J. Glasscock, et al. (2004). “An ORFeome-based analysis ofhuman

transcription factor genes and the construction of a microarray to interrogate their

expression.” Genome Res 14(lOB): 204 1-7.

Miller, J., A. D. McLachlan, et al. (1985). “Repetitive zinc-binding domains in the protein

transcription factor lITA from Xenopus oocytes.” Embo J 4(6): 1609-14.

Nei, M., and Kumar S. (2000). Molecular Evolution and Phylogenetics, New York: Oxford

University Press.

Nei, M., X. Gu, et al. (1997). “Evolution by the birth-and-death process in multigene

families ofthe vertebrate immune system.” Proc Natl Acad Sci U S A 94(15): 7799-

806.

Niimura, Y. and M. Nei (2003). “Evolution of olfactory receptor genes in the hurnan

genome.” Proc Nati Acad Sci U S A 100(21): 12235-40.

Niimura, Y. and M. Nei (2005). “Comparative evolutionary analysis of olfactory receptor

gene clusters between humans and mice.” Gene 346: 13-2 1.

Obta, T. (2000). “Evolution ofgene families.” Gene 259(1-2): 45-52.

65

Page 84: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Page, R. D. and M. A. Charleston (1997). “From gene to organismal phyogeny:

reconciled trees and the gene tree/species tree problem.” Mol Phylogenet Evol 7(2):

231-40.

Quignon, P., E. Kirkness, et al. (2003). “Comparison ofthe canine and human olfactory

receptor gene repertoires.” Genome Biol 4(12): R$0.

Rhodes, D. and A. Klug (1993). “Zinc fingers.” Sci Am 268(2): 56-9, 62-5.

Rosati, M., M. Marino, et al. (1991). “Members of the zinc fingerprotein gene family

sharing a conserved N-terminal module.” Nucleic Acids Res 19(20): 566 1-7.

Rousseau-Merck, M. F., D. Koczan, et al. (2002). “The KOX zinc finger genes: genorne

wide mapping of 368 ZNF PAC clones with zinc finger gene clusters predominantly

in 23 chromosomal loci are confirrned by human sequences annotated in

EnsEMBL.” Cytogenet Genome Res 98(2-3): 147-53.

Sander, T. L., A. L. Haas, et al. (2000). “Identification ofa novel SCAN box-related protein

that interacts with MZF 1 B. The Ieucine-rich SCAN box mediates hetero- and

homoprotein associations.” J Biol Chem 275(17): 12857-67.

Schmidt, D. and R. Durrett (2004). “Adaptive evolution drives the diversification of zinc

finger binding domains.” Mol Biol Evol 21(12): 2326-39.

Schuh, R., W. Aicher, et al. (1986). “A conserved fami]y of nuclearproteins containing

structural elernents ofthe fingerprotein encoded by Kruppel, a Drosophula

segmentation gene.” Ceil 47(6): 1025-32.

Schumacher, C., H. Wang, et al. (2000). “The SCAN domain mediates selective

oligomerization.” J Biol Chem 275(22): 17 173-9.

66

Page 85: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Shannon, M., A. T. Hamilton, et al. (2003). “Differential expansion ofzinc-finger

transcription factor loci in hornologous human and mouse gene clusters.” Genome

Res 13(6A): 1097-110.

Sharon, D., G. Glusman, et al. (1999). ‘Primate evolution of an olfactory receptor cluster:

diversification by gene conversion and recent emergence ofpseudogenes.”

Genomics 61(1): 24-36.

Sitnikova, T. and C. Su (199$). “Coevolution of immunoglobulin heavy- and Iight-chain

variable-region gene families.” Mol Biol Evol 15(6): 6 17-25.

Stamatakis, A., T. Ludwig, et al. (2005). “RAxML-III: a fast program for maximum

likelihood-based inference of large phylogenetic trees.” Bioinfomatics 21(4): 456-

63.

Stone, J. R., J. L. Maki, et al. (2002). “The SCAN domain ofZNFl74 is a dimer.” J Biol

Chem 277(7): 544$-52.

Theunissen, O., f. Rudt, et al. (1992). “RNA and DNA binding zinc fingers in Xenopus

TFIIIA.” CelI 71(4): 679-90.

Thornton, J. W. and R. DeSalle (2000). “Gene family evolution and hornology: genomics

meets phylogenetics.” Annu Rev Genomics Hum Genet 1: 41-73.

Urrutia, R. (2003). “KRAB-containing zinc-finger repressor proteins.” Genome Biol 4(10):

231.

Venter, J. C., M. D. Adams, et al. (2001). “The sequence ofthe human genome.” Science

291(5507): 1304-51.

67

Page 86: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Witzgall, R., E. O’Leaiy, et aI. (1994). “The Kruppel-associated box-A (KRAB-A) domain

of zinc finger proteins mediates transcriptional repression.’ Proc Nati Acad Sci U S

A 91(10): 4514-8.

68

Page 87: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Figure 1: Flowchart of the analysis procedure ofC2H2-ZNF genes and clusters

Compare with syntcnicaliv homologous

Identify C2H2-ZNF elusters elusters from complcted genomes

I IAna; in

ht:pe1ulus

tttisrvegicus inisiliiris

on the ‘cnonIc

t II Investigate the phylogenetic retationships hctw ccii

______________________

thc homologous C2H2-ZNF clusters

Extract compicte dataset of C2112-ZNF gcncs

IRcCOflcilllltiofl of phylogenetic rehitionships

+ spatial relationships + Species Trec

ipiens

_______________

69

Page 88: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

350

300

° 250u

200

150Q

100

g 50

o

ji ri t-i -

2 3 4 5 6 7 8 9101214242832404376

No. of C2H2-ZNF genes

Figure 2: Distribution of ail the singletons and clustered genes from the various

human C2112-ZNF sub-families ami gene composition of the C2112-ZNF clusters

A

45% 43 %O Singleton

• In clusters

5.7 %

, _ , s,Type 0f domains associated with C2H2-ZNF genes

B

45

40

35

20

15

10

5

o

C2H2-ZNF clusters with:

O Intervening non- C2H2-ZNF

• Solely C2H2-ZNF

70

Page 89: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

figure 2

Distribution of ail the singletons and clustered genes from the varlous human C2112-

ZNf sub-families and gene composition of the C2H2-ZNf clusters.

A) The number of genes belonging to the various C2H2-ZNF subfamilies are shown as

weÏÏ as the proportion of genes found as singletons or as part of clusters. C2H2-ZNF genes

associated with KRAB and SCAN domains are more often found to be ctustered. S-K=

C2H2-ZNf containing both a SCAN and a KRAB domain. NONE= C2H2-ZNf without

any conserved domain associated. The percentage distribution is mentioned on top of each

bar for each sub-family.

B) The number of C2H2-ZNf clusters is shown with respect to the number of genes present

in each cluster. The proportion of clusters composed of solely C2H2-ZNf without any

intervening gene or with intervening genes other than C2H2 ZNf (Non-C2H2-ZNF) is aÏso

represented. A star (*) identifies large clusters present on chromosome 19.

71

Page 90: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Fig

ure

3:D

iffe

rent

ial

expa

nsio

nan

dlo

ssof

C2H

2-Z

NF

clus

ters

infi

vem

amm

alia

nge

nom

es

AB

50N

o.ol

No

ot(h

isft

rs(2

112-

ZN

F45

‘Hom

osa

pien

s81

518

‘Pan

trog

lody

tes

7939

735

‘.

3O

Mus

mus

culu

s62

232

ai25

..

Ra[

tis

norv

egle

us

5S17

220

z

‘(an

isth

niili

ai-is

’57

184

15

10

‘Gal

lus

gallu

s’05-

‘Xenopus

laev

is’

o

•Hum

anC

him

panzee

LiM

ouse

ER

atE

JDog

Iii

ail

sie

cle

sP

rim

ate

speclr

lcL

oss

inrodents

Absence

Indog

I

8.4

19.1

1

q -J[T

luit

rrIi

flI

19.1

27.4

10.1

19.7

12.1

19.8

X.1

19.6

Synte

nic

ally

hom

olo

gous

C2H

2-Z

NF

clust

ers

Page 91: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Figure 3

Differential expansion and Ioss of C2H2-ZNF clusters in five mammalian genomes.

A) Evolution of the C2H2-ZNF repertoires in primates, rodents and dog. The number of

C2H2-ZNF clusters and the total number of C2H2-ZNF found in these clusters are

mentioned on the species tree. Since Xenopus taevis and Galtus gaÏÏus C2H2-ZNF are used

as an outgroup in phylogenetic studies, these species are also positioned on the tree.

The figure indicates the primate-specific increase in the number of C2H2-ZNf as cornpared

to rodents and dog.

B) A graphical representation of different scenarios seen in the evolution of human C2H2-

ZNF clusters and its syntenically hornologous C2H2-ZNf clusters in chirnpanzee, mouse,

rat and dog. The human clusters selected and narned on the graph as well as their syntenic

counterparts were 1) present in ail species, 2) primate-specific, 3) lost in rodents or 4)

absent in dog. For each hurnan C2H2-ZNF cluster named on the graph, the first number

indicates the chromosome number and the second is the number attributed to that cluster on

the chromosome. Supplementary Figure S2 provides a more comprehensive graphical

representation including the 40 human clusters that contain at least 3 C2H2-ZNF and their

syntenic counterparts in the four other mammals.

73

Page 92: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Figure 4: Evolutionary scenarios in the phylogenetic tree

A C

Species Tree Cene Ioss in species

Species I SI gene

Species 2 S2 ene

Species3 •••• S3eiie

Species 3 54 gene

outgroup Outgroup

BCene gain in species

74

Page 93: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

figure 4

Evolutionary scenarios in the phylogenetic tree

The different kinds ofevolutionary scenarios seen in the phylogenetic tree are shown.

A) Species tree showing the evolutionary relationship between the species, 1, 2, 3 and 4.

B) A species-specific gain of genes appears as a clade including a single hornolog from

one species and multiple homologs from the other. Phylogeny between genes Si gene, S2

gene, 53 gene and S4 gene from species 1, 2, 3 and 4 respectively. Gene gain in species 4 is

observed. C) Species-specific gene loss appears as the absence of a corresponding ortholog

for one species on the tree and is deduced from the evolutionary relationships of the species

considered with the other species. Loss ofthe corresponding gene (S3 gene) in species 3.

75

Page 94: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Figure 5: Phylogenetic analysis of C2H2-ZNF genes in cluster 19.12 of human and its

syntenic counterparts in other mammals

Ji!ii Z!4R LI K 12

_________________________

flZNF33I K 2CZNF33I K 2

o(1œ474328 K 3

H 4__LGH593Ki(

cLOC4O2273 k IiCC0Œ11583 613

1 I cLOc.154324KI7

I cC0C424323 KilCL0C611590K4

I ioo cL0C47733l K 17

50 6ZNF677 K III-- ccoc.174:31 k 16

5ZNF528 K 5

om rC LZ.7 K 17

M L i)ZNF534 K I’

t_1111 I1ZNF61OK9

i IIZNF4SO K 12

t [lF4L k 12oLOC134333 K 2

‘1 10i hZNFC3 k IN

6ZNFIC0 K 2)1

1)27)5471Kil

6ZNF3I7 K211

K 14

IZNF8IBA k 560 ILZNF178k II

65 h_1F

mi561(716.7170624

ILZI200II K If

k 4

L SZNF765KIIiml— L0C350421 il

52115578 12iioi OZNF32O k 12 L

pZNF72O k 12 L

I iio hFU16287 625C03C.IfL27L K22

ii 52115766 k TUPL473 k lU

L 5COCL745768 N

I Lirr SZNFIS7 5

3J UZM°8315I omr 721)7010 K21

SZNFK16 K21mBCO433Q1 K 22

mZlp7l9 K Rudonl

______________

cL0C611692 621)7C7)L12771 K TU

CLOC424341 637UZNF577 65

d_062491432 1

5ZNF432 K 17525F4C2 K 6pZF)F7i.I kilUZNF6I4 kil

CLOCUI6U9 612UZNFGI3 Kitp2110613 K 2

r SZNF6I5 KOp2lIF715 K17- UZNF3SO 65

LOC484338 K SI)

miF02115649 K III

52150644 k lU

1L0c4:r251 koSZNFI75 619

Outgroup

h P m r eI 1 0 0 10

Group I

h P m r e30 12 0 0 3

h p m r e

$7315

0hk649_h0hk175_3

32_4656701_5

137_3137_2137_4L41 l_5

cSk

Xfln

C51L677_3chkOiS_4

— oh6175_2

L»og- ,ooiflc gain

Croup H

l’niilalc-siioriUc gailI

Croup 111

7670

Page 95: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Figure 5

Phylogenetic analysis ofC2H2-ZNF genes in cluster 19.12 ofhuman and its syntenic

counterparts in other mammals.

A phylogenetic tree was built using the amino acid sequences corresponding to the zinc

finger regions of the various human C2H2-ZNF from cluster 19.12 and their syntenic

counterparts in chirnpanzee, mouse, rat and dog. The tree was generated using a maximum

Ïikelihood method (RaxML) and verified using a bayesian method (Mr.Bayes). 346 sites

from 101 sequences (including the 20 outgroup sequences from chicken and Xenopus) were

used in the analysis. The tree is divided into three major Groups (I-III). A tabulation of the

number of genes present in each group is indicated for cadi species (h: human, p:

chimpanzee, rn: mouse, r: rat, e: dog. lie bootstraps values are indicated for each node on

the tree. A small black circle is also represented at each node in cases where the posterior

probability value is equal to 1.00. Tus cluster contains only C2H2-ZNF genes that are

either from tic KRAB subfamily or that do not encode any conserved N-terminal domain.

Next to the name of each C2H2-ZNF gene, the presence of an N-terminal KRAB dornain is

indicated by a K and number of zinc finger motifs is mentioned. A clear evidence of

differential expansion is seen in primates and dog. Loss of C2H2-ZNF in the rodent lineage

is also observed.

77

Page 96: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

8LU

—z-q.

L(X’6I1692RCD13095(.411(0433811%W3562511NF175

Zîp719

l93J405K0’HB

COU-184311ZXI’6491ZNF577D

L0C491432L0C468981IIZNF649(133(131861)

10(611669ZXF6137X0613

1,0(4833311ZNF300

Il7X0615I1X0615

Ni’lilZNF6I3

7X0132)•j7X0332-

-

-T-TEE 7NF61617X0616

—I—1.0(456261014162117

(M

l.0C7111971)7X0766(M •

____

ZNF3#Ofl7X0380D(L0C368984)7X0610

L0C468985)ZN0028

L0C366908ZX0534

TÇC740568[DIL0C356421ft1NF578

I(7X01108)

L7X0781

L0U3669NlA7X01377X003D

‘11.0(7186071.0(72903))

Il1X0611Ï—.

7X0600U7X0608

(C0C456267(jJ7X0211OO7X04611

7XF3207NF320-

(U)C456268)iJ1,0(300559

(7X0160)7X0816

(L0C356426)ZX0782E (U)C456269)D)C0U456270)D

7X0161)

1X0115

7X0347M

ZX0665—.

11X1810)D1.0(41113317X0677

1.0(376394

(0(9)664

ZX0525EE

7X0765

ZX0168A7XF761ED

—ZXF8L0E-

(M—UT(0(611599

1,0(460782

L0C611590

1.0(611583

p1,0(184329

(0(484326

1,0(481323

L0C484323

j7X0331ZXF33IZX0331

Page 97: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Figure 6

Physical maps showing the organization ofthe human C2112-ZNF from cluster 19.12

localized on 19q13.4 and its syntenïcally homologous counterparts in other mammals.

For the large C2H2-ZNF cluster 19.12 and its syntenically homologous counterparts in

chirnpanzee, mouse, rat and dog, each C2H2-ZNf genes is represented by an open arrow

which indicates its orientation on the chromosome strands; this exciudes the pseudogenes

whose name appears in parenthesis. For these clusters which contain only C2H2-ZNF that

are from the KRAB subfamily or that do flot encode any conserved N-terminal domain, the

presence of a conserved N-terminal KRAB dornain is indicated by as square positioned in

front ofthe open arrow representing the gene. Genes identified as orthologs, based on the

phylogenetic tree and physical rnaps, are underlined and are aligned verticaÏly on their

respective chromosomes. Dotted lines separate the genes belonging to Group I, Group II

and Group III defined in the phylogenetic tree (figure 5). The two species specific groups

from dog and primates are seen in Group I and Group II, respectively.

79

Page 98: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Figure 7: Variation in the numbers of zinc finger motifs in mammals and in the

presence of conserved N-terminal domains in orthologs

•Human Chimpanze Mouse JRat uDog

r -

ii 1H F ii a [ r

Ail lRAll 5C ‘. - (RAIl %( lii li I lIt)lE I) l)

C2H2-ZNF subfamilies

B

(--

Humati

___________________

2 —

E E

Chimpanzee

Mouse

---------------N R N N N

N N N N N

Rat

Dog

CHuman C2H2-ZNF from clusters

SCA-KRAll(II/14)

SCAN(16/29)

80

Page 99: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

figure 7

Variation in the numbers of zinc finger motifs in mammals ami in the presence of

conserved N-terminal domains in orthologs.

A) The average number of zinc finger motifs was calcuiated for ail the C2H2-ZNF from the

$1 hurnan clusters identified and their corresponding syntenically liomologous clusters in

the other mammals; for each species, the average number for the total C2H2-ZNF (Ail) and

for members ofthe various C2H2-ZNF sub-families (KRAB, SCAN, SCAN-KRA3, 3TB,

HOMEO, SET and, NONE no conserved domain associated) is presented. For each

categoiy, the number of genes in each species is listed above the bars in the following order

(human, chimpanzee, mouse, rat and dog).

B) for the human C2H2-ZNF cluster 6.2 (chromosome 6p22. 1) and its syntenically

homologous counterparts in chirnpanzee, mouse, rat and dog, each C2H2-ZNF genes is

represented by an open arrow whicli indicates its orientation on the chromosome strands;

this exciudes the pseudogenes whose name appears in parenthesis. For these clusters which

contain C2H2-ZNF that are from the KRAB or SCAN subfamily or that do not encode any

conserved N-terminal domain, the presence ofa conserved N-terminal is indicatcd by as

square for a KRAB domain or an open circle for a SCAN dornain both being positioned in

front of the open arrow representing the gene. Genes identified as orthoÏogs, based on the

phylogenetic tree and physical maps, are aiigned vertically on their respective

chromosomes. Cases wliere domain shuffling was observed among orthoÏogs from the

different mammals are marked by a grey box.

$1

Page 100: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

C) Exon-Intron organization of most hurnan C2H2-ZNF from the SCAN-KRAB and

SCAN subfarnilies. 80% ofthe human SCAN-KRAB C2H2-ZNf (11/14) and 55% ofthe

SCAN C2H2-ZNF (16/29) found in clusters have the presented exon-intron structures

shown. The exons encoding the SCAN, KRAB (A box) and ZNF are indicated.

82

Page 101: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Figure 8: Model for the evolution of the SCAN, SCAN-KRAB and KRAB C2H2-ZNF

subfamilies

AC2112-ZNf SCAN-C2H2-ZNf SCAN-KRAB C2H2-ZNF KRÀB C2H2-ZNF

Gain afSCAN event

Gtii,, ofKRAB avent

E1Loss cfSCAN

B

F RAB ZNF NF

Vcrtehrate-spccftîc

Tetrapod-specific

Dup Dup Dup

2Singular gain events

Gain of SCAN Gain ofKRABZNF SCAN-ZNF SCAN-KRABZNF 4

Ï L55OfKB] OMultiple loss cventst Loss of SCAN

KRAB-ZNF

Lois of SCAN-KRAB

______

Liii ofKRAB

83

Page 102: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Figure 8

Model for the evolution of the SCAN, SCAN-KRAB and KRAB C2112-ZNF

subfamilies

A) Sequential events of exon shuffling leading to the birth of SCAN-C2H2-ZNF and

SCAN-KRAB C2H2-ZN}’ subfamilies. Most of the SCAN-C2H2-ZNF and the SCAN

KRAB C2H2 ZNF have the exon-intron structure shown (boxes represent exons). Birth of

new families may have occurred by an exon shuffling mechanism leading presumably first

to the acquisition of a SCAN domain by a C2H2-ZNF and later of a KRAB domain by a

SCAN-C2H2-ZNF. Most SCAN-KRAB C2H2-ZNF have a single exon placed in between

the exon encoding the KRAB A box (identified as KRAB) and the exon encoding the zinc

finger domain (ZNF). This exon encodes in most instances the so-called KRAB B, b, or C

boxes.

B) Dynamic evolution of C2H2-ZNF after birth of the SCAN and $CAN-KRAB

subfamilies through gene duplication and recurrent loss of effector domains. A first SCAN

C2H2-ZNF appeared in an ancestor of vertebrates following the gain of a SCAN domain by

a C2H2-ZNf (in grey box); duplication then led to the establishment of the SCAN C2H2-

ZNF subfamily. The gain of a KRAB domain at the emergence of tetrapods by a SCAN

C2H2-ZNF gave rise to a SCAN-KRAB C2H2-ZNf (in grey box). This was followed by

duplication and establishment of the SCAN-KRAB subfamily. Loss of SCAN domain by

deletion or degeneration from some SCAN-KRAB C2H2-ZNF genes followed in many

instances by duplication led to the expansion of the KRAB C2H2-ZNF. Duplication and

loss of SCAN or KRAB domains by deletion or degeneration from SCAN, $CAN-KRÀB

84

Page 103: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

and KRAB C2H2-ZNF subfamilies are seen as a recurrent theme shaping the repertoires

ofthe C2H2-ZNF subfamilies.

85

Page 104: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

98Totalno.ofhumanC2H2-ZNFgenes

,.oor)oooOOOO

rrI

10

40

80-t

200

400

-t

600

800

1000—.

-t1400

1800(s—.

oCI)

3000

w

5000CI)

7000

9000—J

oc11000

13000t

15000

17000

20000

22000

27000C

35000

Page 105: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Supplementary Figure 1: Distribution of intergenic distances between 718 C2112-

ZNF in the human genome.

Supplementary Figure 1

Distribution of intergenic distances between 71$ C2H2-ZNF in the human genome.

The intergenic distances between the consecutive C2H2-ZNF on each chromosome was

calculated for each C2H2-ZNF ofthe human genome. For the 718 C2H2-ZNF, the number

of C2H2-ZNF found within the range of intergenic distances indicated on the x axis is

plotted on the y axis. For example, there are 108 C2H2-ZNF within 10 to 20 Kb from a

consecutive C2H2-ZNF.

$7

Page 106: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Supplementary Figure 2: Comparison of the number ofC2H2-ZNF genes in the 40

human clusters containing at Ieast 3 C2H2-ZNF ami their syntenic counterparts in

four other mammals

Supplementary Figure 2A

9-

8

Human 1Chimpanzee DMouse Rat DDog,w 7 -

__________

ew

6e

£

w 5ie£w

4Nc’1

ii IL k b1,2 2,2 3,1 3,2 4,1 7,2 7,3 7,6 8,3 9,1 10,2 10,3 15,2 15,3 17,1 18,1 19,2 19,10 20,2 X.1

Syntenically homologous C2H2-ZNF clusters with 3-5 C2H2-ZNF in Human

88

Page 107: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Supplementary Figure 2B

80

70

Human S Chimpanzee D Mouse Rat D Dog60

(n

50

40

uzN

30(.4oqo 20

:Lh,hk_idb_Lii, L1,5 3,3 5,1 6,2 7,4 7,5 7,7 8,4 10,1 12,1 16,1 16,3 19,5 19,6 19.7 19,8 19,9 19,11 19.12 19,13

Syntenically homologous C2H2-ZNF clusters with at Ieast 6 C2H2-ZNF in Human

89

Page 108: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Supplementary Figure 2

Comparison of the number of C2H2-ZNF genes in the 40 human clusters containing

at least 3 C2H2-ZNF and their syntenic counterparts in four other mammals.

For each human C2H2-ZNF cluster named of the graph, the first number indicates the

chromosome number and the second is the number attributed to that cluster on the

chromosome. C2H2-ZNF chisters with six or more (A) and three to five (B) genes in

human and their syntenic counterparts Chimpanzee, Mouse, Rat and Dog. This figure

provides evidence of C2H2-ZNF differential species-specific expansion and gene loss in

rodents.

90

Page 109: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

Su

pp

lem

enta

ryT

able

SI

Co

mp

reh

ensi

ve

cata

logue

oft

he

718

C2H

2-Z

NF

gen

esin

the

hum

ang

eno

me.

Chr

1P

osi

tion2

Clu

ster

3P

seudo4

Nam

e5D

escr

ipti

on6

Dom

ain7

F8o

L1°

Star

f11

Stop

12

11p

36.3

11

p3

6.2

l-

11p

36.2

-p36

.1-

I1p

36-

11p36.l

Ili

a

11p

36.1

11

.lb

11p35.l

-

11p

34.3

-

11p

34.2

1.2a

11p

34.2

1.2b

11p

34.2

1.2c

11p

34.2

-

11p34.l

1.3a

11p

34.1

1.3b

I1p

32.3

-

I1p

22.2

-

11p

22-

11q

22-

11q

25.1

-

11q

25.3

-

11q

31.1

-

11q

31.2

-

11q

31.3

-

11q

32.1

-

11q

42.1

31.

4a

I1q

42.1

31.

4b

11q

43-

11q

44-q

ter

-

11q

44i.

5a

11q

441

5b

11q

441.

5c

11q

441.

5d

HK

R3

GL

I-K

rupp

elfa

mily

mem

ber

HK

R3

BIB

11+

688

6562

698

6571

926

PRD

M2

Ret

inob

last

oma

prot

ein

inte

ract

ing

ZN

FS

ET

7+

1718

1390

3937

1402

4162

ZB

TB

J7Z

inc

fing

eran

dB

IBdo

mai

nco

ntai

ning

17B

IB13

-80

316

1409

5116

1751

01

ZN

F43

6Z

inc

fing

erpr

otei

n43

6K

RA

B12

-47

023

5590

5523

5674

66

ZN

F59

3Z

inc

fing

erpr

otei

n59

3-

1+

116

2636

1096

2636

9951

ZN

F68

3Z

inc

fing

erpr

otei

n68

3-

4-

509

2656

0712

2657

1853

ZB

IB8

Zin

cli

nger

and

BIB

dom

ain

cont

aini

ng8

BIB

2+

512

3277

7359

3284

4129

ZN

F31

Zin

cfi

nger

prot

ein

31(K

0X29

)SC

AN

10+

977

3371

0846

3373

4582

ZN

F64

3Z

inc

fing

erpr

otei

n64

3K

RA

B9

+43

240

6883

6640

7019

39

ZN

F64

2Z

inc

fing

erpr

otei

n64

2K

RA

B9

+50

540

7158

8940

7346

02

ZN

F68

4Z

inc

fing

erpr

otei

n68

4K

RA

B8

+37

840

7698

2040

7864

25

ZN

F691

Zin

cfi

nger

prot

ein

691

-7

+28

443

0848

6743

0907

35

ZN

F39

3Z

inc

fing

erpr

otei

n39

3-

389

4435

7109

4437

3399

4L

OC

J28

208

sim

ilar

todJ

675G

8.1

(nov

elzi

ncfi

nger

prot

ein)

--

-45

2725

8645

6863

66

GL

IS1

GL

ISfa

mily

zinc

fing

er1

-5

-62

053

7444

9453

9724

65

ZN

F64

4Z

inc

fing

erpr

otei

n64

4-

3-

1327

9115

3443

9126

0259

GFI

IZ

inc

fing

erpr

otei

nG

fi-1

-6

-42

292

7129

0992

7250

21

ZB

TB

7BZ

inc

fing

erpt

otei

nan

dB

IBdo

mai

n7B

BIB

4+

539

1532

5354

815

3256

078

4JZ

BT

B37

Zin

cfi

nger

and

BIB

dom

ain

cont

aini

ng37

BIB

-+

361

1721

0415

517

2109

404

ZN

F64

8Z

inc

fing

erpr

otei

n64

8-

10-

568

1802

9032

818

0297

470

4)L

0C44

1918

sim

ilar

toZ

incf

ing

erp

rote

in13

2-

-+

-18

3273

244

1832

8010

94)

L0C

391

146

sim

ilar

tozi

ncfi

nger

prot

ein

101

--

--

1896

9432

118

9695

464

ZB

TB

41Z

inc

fing

eran

dB

TB

dom

ain

cont

aini

ng41

BIB

14-

909

1953

8943

719

5436

295

ZN

F281

Zin

cfi

nger

prot

ein

281

-4

-89

519

8642

043

1986

4578

9

ZN

F67

8Z

inc

fing

erpr

otei

n67

8-

15+

525

2258

1786

722

5910

754

Gm

l27

Sim

ilar

tozi

ncfi

nger

prot

ein

ZF

PK

RA

B7

-71

422

5951

873

2259

6102

3

L0C

441

927

sim

ilar

tozi

ncfi

nger

prot

ein

532

-2

-17

924

0540

626

2405

4681

9

ZN

F23

8Z

inc

ling

erpr

otei

n23

8B

IB4

+53

124

2281

208

2422

8740

1

4)Z

NF

695

Zin

cfi

nger

prot

ein

695

KR

AB

--

133

2451

7548

724

5237

946

ZN

F67O

Zin

cfi

nger

prot

ein

670

KR

AB

9-

389

2452

6671

024

5308

692

ZN

F66

9Z

inc

fing

erpr

otei

n66

9K

RA

B9

-46

424

5329

916

2453

3425

1

ZN

F12

4Z

inc

fing

erpr

otei

n12

4(H

ZF-

16)

KR

AB

7-

289

2453

8582

624

5401

941

91T

adep

afly

et.a

I

Page 110: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

)

I1

q4

41.

5e

11

q4

41

5f

11q

441.

6a

11

q4

41.

6b

22p25

-

22p23.3

2.l

a

22p23.3

2.l

b

22

pl6

.l-

22p

13.2

-p13

.1-

22p13

-

22q

11.1

2.2a

22q11.2

2.2b

22q

11.1

2.2c

22q13

2.3e

22q13

2.3b

22q14

-

22q21.2

-

22

q3

1.2

-q3

1.3

-

22q32

-

22

q3

4-q

35

-

22q34

-

33

p2

4.3

-

33p

22.1

3.l

a

33p

22.1

3.l

b

33p22.l

3.l

c

33p

22.1

32e

33p

22.1

3.2b

33p

22.1

3.2c

33p

21.3

23.

3a

33

p2

1.3

23.

3b

33

p2

2.3

-p2

l.l

3.3c

33p21.3

23.

3d

33p

213.

3e

33p

22-p

213

3f

33p

21.3

13

3g

L0C

7298

06si

mil

arto

Zin

cfi

nger

prot

ein

492

KR

AB

10-

741

ZN

F49

6Z

inc

fing

erpr

otei

n49

6SC

AN

-KR

AB

5-

249

ZN

F67

2Z

inc

fing

erpr

otei

n67

2-

13+

452

ZN

F69

2Z

inc

fin9

erpr

otei

n69

2-

5-

519

KL

FI1

Kw

ppel

-Iik

efa

ctor

11-

3+

512

ZN

F51

3Z

inc

fing

erpr

otei

n51

3-

7-

541

ZN

F51

2Z

inc

fing

erpr

otei

n51

2-

2+

567

BC

L1lA

B-c

eIl

CL

ulym

phom

a1J

A(z

inc

fing

erpr

otei

n)-

3-

779

YZ

NF

638

Zin

cfi

nger

prot

ein

636

--

+-

EG

R4

Ear

lygr

owth

tesp

on

se4

-3

-48

6

ZN

F51

4Z

inc

fing

erpr

otei

n51

4K

RA

B7

-40

0

ZN

F2Z

inc

fing

erpr

otei

n2(

A1-

5)K

RA

B9

+42

5

L0C

3440

65S

imil

arto

zinc

fing

erpr

otei

n13

5-

10+

524

WL

0C34

3938

Sim

ilar

tozi

ncfi

nger

prot

eins

32-

-+

-

WL

0C44

2041

Sim

ilar

tozi

ncfi

nger

prot

ein

532

--

--

GL

I2G

LI-

Kru

ppel

fam

ilym

embe

rG

LI2

-4

+12

58

L0C

4420

49S

imil

arto

zinc

fing

erpr

otei

n28

5K

RA

B8

+65

3

ZN

F53

3Z

inc

fing

erpr

otei

n53

3-

4-

471

KL

F7K

rupp

el-I

ike

fact

or7

-3

-30

2

ZN

F14

2Z

inc

fing

erpr

otei

n14

2-

18-

1524

ZN

FN1A

2Z

inc

fing

etpr

otei

nsu

bfam

i!y

lA,

2(H

elio

s)-

4-

526

L0C

3890

99S

imil

acto

zinc

fing

erpr

otei

n53

3-

1-

145

ZN

F61

9Z

inc

fing

erpr

otei

n61

9-

10+

371

ZN

F62O

Zin

cfi

nger

prot

ein

620

KR

AB

8+

422

ZN

F621

Zin

cfi

nger

prot

ein

621

KR

AB

7+

439

ZN

F651

Zin

cfi

nger

prot

ein

651

-8

+37

1

ZN

F66

2Z

inc

fing

erpr

otei

n66

2K

RA

B8

+42

6

4)L

0C33

9903

Sim

ilar

tozi

ncfi

nger

prot

ein

621

KR

A6

-+

128

ZN

F44

5Z

inc

fing

erpr

otei

n44

5SC

AN

-KR

AB

14-

1031

L0C

2853

46S

imil

arto

zinc

fing

erpr

otei

nZ

FP1

KR

AB

12-

544

ZN

F16

7Z

incf

inge

rpro

tein

167

SCA

N-K

RA

B13

+75

4

ZN

F66O

Zin

cfi

nger

prot

ein

660

-10

+33

1

ZN

F197

Zin

cfi

nger

prot

ein

197

SCA

N-K

RA

B22

+10

29

ZN

F35

Zin

cfi

nger

prot

ein

35-

11+

519

ZN

F5O

2Z

inc

fing

erpr

otei

n50

2-

14+

544

2454

194

99

2455

3024

5

2470

9915

3

2471

1062

8

1010

1133

2745

3606

2765

9397

6053

1806

7141

2397

7337

157

0

9517

7127

9519

4910

9523

6998

1101

0959

0

1105

5280

7

1212

6632

7

1448

3435

8

1800

1495

4

2076

5377

4

2192

1088

3

2135

7958

9

2215

0080

4049

3641

4052

2534

4054

1508

4267

5678

4292

2406

4295

3061

4445

7410

4451

2201

4457

1717

4460

1460

4464

1515

4466

5259

4472

9142

2454

3044

9

2455

6166

8

2471

1033

7

2471

1989

4

1011

2414

2745

7097

2769

9467

6063

4137

7151

5697

7337

4118

9518

8990

9521

3792

9524

5078

1101

2573

7

1105

6903

1

1214

6632

1

1449

7284

6

1804

3431

2

2077

3885

9

2192

3259

9

2137

2330

3

2218

6054

4050

4881

4053

4042

4055

6047

4268

4076

4293

4136

4295

9286

4449

4166

4451

6816

4459

9979

4461

2561

4466

4967

4467

7280

4474

0327

92T

adep

aHy

et.a

I

Page 111: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

33p2l.

3l

33p2l

33

pI4

.2

33

pl2

.3

33p11.l

33q

12.3

33p

12-q

ter

33q

13.2

33q

21

33q

13-q

21

33q

24

33q

24

33

q2

63

2

33q

26.3

-q27

33q

27

33

q2

9

44pl6

.3

44pI6

.3

44p

16.3

44p

16.3

44p

16.3

44p

16.1

44p

14-p

15.1

44

q1

2

44q

31.1

-q31

.2

44q

35.2

5p1

5.33

5p15

.1

5p13

.3

5p12

5p1

l-p

12

5q13

.2

5q35

.3

5q33

.1

5q35

.2

ZN

F5O

1

ZN

F58

9

ZN

F31

2

ZN

F71

7

ZN

F65

4

ZB

IB1I

ZN

F8O

ZB

TB

20

ZN

F14

8

KL

F15

ZIC

4

Zic

i

ZN

F63

9

WIG

1

BC

L6

4)L

0C28

5388

ZN

F59

5

MG

C26

356

YL

0C65

4254

ZN

F141

ZN

F5O

9

L0C

441

007

KL

F3

RE

SI

ZN

F33O

Zfp

42

ZF

P62

ZN

F62

2

ZF

R

WL

0C44

2134

ZN

F131

ZN

F36

6

EG

R1

ZN

F30

0

7NF

346

Zin

cfi

nger

prot

ein

501

Zin

cfi

nger

prot

ein

589

Zin

cfi

nger

prot

ein

312

Zin

cfi

nger

prot

ein

717

Zin

cfi

nger

prot

ein

654

Zin

cfi

nger

and

BT

Bdo

mai

nco

ntai

ning

11

Zin

cfi

nger

prot

ein

80

Zin

cfi

nger

and

BIB

dom

ain

cont

aini

ng20

Zin

cfi

nger

prot

ein

148

Kw

ppel

-lik

efa

ctor

15

Zic

fam

ilym

embe

r4

Zic

fam

ilym

embe

rI

Zin

cfi

nger

prot

ein

639

P53

targ

etzi

ncfi

nger

prot

ein

Zin

cfi

nger

prot

ein

51

Sim

ilar

tozi

ncfi

nger

prot

ein

161

Zin

cfi

nger

prot

ein

595

Sim

ilar

tozi

ncfi

nger

prot

ein

595

Sim

ilar

tozi

ncfi

nger

prot

ein

595

Zin

cfi

nger

prot

ein

141

Zin

cfi

nger

prot

ein

509

Sim

ilar

tozi

ncfi

nger

ptot

ein

596

Kru

ppel

-Iik

efa

ctor

3

RE

J-s

ilen

cing

tran

scri

ptio

nfa

ctor

Zin

cfi

nger

prot

ein

330

Zin

cfi

nger

prot

ein

42

Zin

cfi

nger

prot

ein

62ho

mol

og

Zin

cfi

nger

prot

ein

622

Zin

cfi

nger

RN

Abi

ndin

gpr

otei

n

sim

ilar

tozi

ncfi

nger

prot

ein

35

Zin

cfi

nger

prot

ein

131

Zin

cfi

nger

prot

ein

366

Zin

cfi

nger

prot

ein

225

Zïn

cfi

nger

prot

ein

300

Zin

cli

nger

prot

ein

346

9+

262

421

6-

459

19-

907

1+

299

12-

1053

7-

273

5-

668

4-

794

3-

416

4-

334

4+

447

5+

485

2-

288

6-

706

18+

648

6+

274

11+

474

7+

765

--

178

3+

345

6+

1097

-+

320

3+

310

j

3.3h

3.4a

3.4b

3.5a

3.5b

3.6a

3.6b

4.l

a

4.l

b

4.l

c

4.l

d

KR

AB

KR

AB

BT

B

BIB

BT

B

KR

AB

KR

AB

BTB

KR

AB

KR

AB

4474

6128

4825

7649

6233

0399

7564

1490

8827

1165

1028

5097

8

1154

3616

8

1155

4020

7

1264

3360

9

1275

4416

8

1485

8652

7

1486

0987

1

1805

2424

5

1802

2419

6

1669

2185

9

1943

5594

7

4322

7

1964

18

2528

40

3216

17

4342

879

9048

008

3834

2212

5746

8799

1423

6154

7

1891

5391

9

1266

7

1650

4628

3239

0213

4246

0563

4315

7399

7177

4990

1378

2908

0

1502

5415

7

1763

8230

3

4475

3579

4828

7484

6233

4061

7591

7386

8827

3912

1026

7860

7

1154

3911

5

1163

4881

7

1265

7678

1

1275

5892

6

1486

0709

7

1486

1719

6

1805

3601

4

1802

7227

8

1889

4616

9

1943

5746

0

7809

9

2397

48

3012

65

3590

47

4374

410

9087

407

3837

6796

5749

3097

1423

7530

1

1891

6319

3

2574

1

1651

8894

3248

0601

4250

4182

4321

1593

7183

9005

1378

3290

3

1502

6458

4

1764

2636

4

5 5 5 5 5 5 5 5

16-

498

--

477

-

-10

74

4+

510

11-

744

3+

543

12-

604

4+

294

93T

adep

ally

et.a

I

Page 112: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

1780

7112

8

1782

1960

7

1782

5552

2

1783

0083

0

1763

8341

2

1764

2021

3

1780

9030

9

1782

4402

1

1782

9281

6

1783

2604

0

1783

9465

5

1784

4030

1

55q

35.3

5.l

aZ

NF

354A

Zin

cli

nger

prot

ein

354A

KR

AB

13-

605

55q

35.3

5.l

bZ

NF

354B

Zin

cli

nger

prot

ein

354B

KR

AB

13+

612

55q

35.3

5.l

cZ

FP

2Z

inc

ling

erpr

otei

n2

hom

olog

-13

+46

1

55q

35.3

5.l

dZ

NF

454

Zin

cfin

ger

pro

tein

454

KR

AB

12+

522

55q

35.3

51e

DK

FZ

p686

E24

33si

mil

arto

hypo

thet

ical

ptot

ein

9630

041

N07

KR

AB

13+

685

55q

35.2

51f

ZN

F35

4CZ

inc

fing

erpr

otei

n35

4CK

RA

B11

+55

4

66p

22.1

=Z

NF

322A

Zin

cli

nger

prot

ein

322A

-Il

-40

2

66

p2

l.3

-Y

ZN

F2O

4Z

inc

ling

erpr

otei

n20

4-

--

-

66p

22.1

6.l

aL

0C34

6157

Sim

ilar

todJ

153G

14.3

(nov

elC

2H2

type

ZN

F)-

9+

407

66p2l.

36.l

bZ

NF

184

Zin

cfi

nger

prot

ein

184

KR

AB

19-

751

66p2l.

36.

2aZ

NF

165

Zin

cfi

nger

prot

ein

165

SCA

N6

+48

5

66p

22.1

6.2b

ZN

F43

5Z

inc

fing

erpr

otei

n43

5SC

AN

4+

348

66p

21.3

6.2c

ZN

FI9

2Z

inc

fing

erpr

otei

n19

2SC

AN

-KR

AB

9+

578

66p

22.1

6.2d

YL

0C22

2701

Sim

ilar

tozi

ncli

nger

prot

ein

192

--

+26

9

66p

21.3

62e

ZN

F19

3Z

inc

ling

erpr

otei

n19

3SC

AN

5+

394

66

p2

l.3

3-p

21

.31

62

fZ

NF3

OZ

Zin

cli

nger

prot

ein

307

SCA

N-K

RA

B7

-54

5

66p2l.

3l

62

gZ

NF

187

Zin

cfi

nger

prot

ein

187

-8

+32

5

66p

21.3

16.

2hZ

NF

323

Zin

cfi

nger

prot

ein

323

SCA

N6

-40

6

66p

22.1

6.2i

ZN

F3O

6Z

inc

fing

erpr

otei

n30

6SC

AN

7+

538

66p

22.2

-p21

.36.

2jZ

NF3

O5

Zin

cfi

nger

prot

ein

305

SCA

N11

-60

4

66p22.l

6.2k

YZ

NF

452

Zin

cfin

ger

pro

tein

452

SCA

N-

-13

25

66p

22.1

6.21

ZN

F31

1Z

inc

ling

erpr

otei

n31

1K

RA

B14

-57

4

66p

22.1

-Z

FP

57Z

inc

ling

erpr

otei

n57

hom

olog

KR

AB

7-

508

66p

21.3

3-

ZB

TB

12Z

inc

fing

eran

dB

TB

dom

ain

cont

aini

ng12

BT

B3

-45

9

66p

21.3

6.3a

ZN

F29

7Z

inc

fing

erpr

otei

n29

7B

TB

2-

634

66p

21.3

26.

3bZ

BIB

9Z

inc

fing

eran

dB

TB

dom

ain

cont

aini

ng9

BIB

1+

473

66p

21.3

-p21

.2-

ZN

F76

Zin

cfi

nger

prot

ein

76-

7+

570

66p

ter-

p12.

1-

ZN

F31

8Z

inc

fing

erpr

otei

n31

8-

--

2099

66p

12.1

-Z

NF

45I

Zin

cli

nger

prot

ein

451

-5

+10

13

66

q1

5-

ZN

F29

2Z

inc

fing

erpr

otei

n29

2-

11+

2895

66q

21-

YL

0C44

2240

sim

ilar

tozi

ncfi

nger

prot

ein

259

--

--

66q

21-

ZB

TB

24Z

inc

fing

eran

dB

TB

dom

ain

cont

aini

ng24

(ZN

F45O

)B

TB

8-

697

66q

25-

PLA

G1

Zin

cli

nger

prot

ein

PLA

GL

1-

7-

463

66q

25.1

-Z

BIB

2Z

inc

fing

eran

dB

TB

dom

ain

cont

aini

ng2

BIB

3-

514

2674

4497

2676

7741

2743

3581

2744

7283

2747

4095

2747

7205

2752

6506

2754

8858

2815

4551

2816

5320

2820

0366

2820

5836

2621

7695

2823

3215

2623

7530

2824

5348

2630

1049

2830

9239

2832

0469

2832

7961

2834

2872

2835

3960

2840

0493

2842

9951

2842

5738

2844

2503

2845

4973

2847

5487

2864

7386

2866

3091

2907

0573

2908

1016

2974

8239

2975

6866

3197

5373

3197

7748

3339

0173

3339

3472

3353

0334

3353

3299

3533

5488

3537

1738

4341

1786

4344

5159

5701

9470

5714

3057

8792

1986

8803

0633

1092

1358

510

9214

949

1098

9041

210

9911

133

1443

0313

214

4371

236

1517

7736

515

1804

791

77p

22.1

L0C

441

192

Sim

ilar

10zi

ncfi

nger

prot

ein

162

-38

514

9341

1149

5746

7

94T

adep

ally

et.a

I

Page 113: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

D

77p22.i

77

p2

2.i

77

p2

2.i

77

p2

2.l

D

7.i

a

7.i

b

7.2a

7.2b

77

p2

2.i

7.2c

77p

22.1

7.2U

77p13

-

77p13-p

il.i

-

77p11.2

-

77p11.2

7.3a

77p11.2

7.3b

77p11.2

7.3c

77

pi

I.2-

pi1.

17.

3d

77

p1

1.i

7.3e

77qii

.21

7.4a

77q

11.2

17.

4b

77q

11.2

17.

4c

77q11.2

7.4d

77

q1

1.2

1-q

ii.2

37.

4e

77q

11.2

17.

4f

77q11.2

7.4g

77qli

.21

7.4h

77qii

.2i

7.4i

77

q2

2.i

7.5a

77q

22.1

7.5b

77q22

7.5c

77

q2

2.i

7.5d

77

q2

2.i

7.5e

77q

21.3

-q22

.17

5f

77

q2

2.i

75g

77

q2

2.i

7.5h

77q

22.1

7.6a

77

q2

2.i

7.6b

77

q2

2.i

7.6c

77q3i.

i-

77q3i.

32

-

77

q3

6.i

7.7a

L0C

441

193

Sim

ilar

tozi

ncfi

nger

prot

ein

469

--

-30

3

ZN

F81

5Z

inc

fing

erpr

otei

n81

5K

RA

B3

+51

7

DK

FZ

p434

JIO

I5hy

poth

etic

alpr

otei

nD

KF

Zp4

34J1

OI5

-5

+56

3

DK

FZ

p547

K05

4hy

poth

etic

alpr

otei

nD

KF

Zp5

47K

054

KR

AB

15+

1393

L0C

4422

83S

imil

arto

zinc

fing

erpr

otei

n11

b-

-+

-

ZN

F32

5Z

inc

ling

erpr

otei

n32

5(Z

NF1

2)K

RA

B15

-69

7

GL

I3G

LI-

kwpp

elfa

mily

mem

ber

GL

I3-

5-

1580

ZN

FN1A

1Z

inc

fing

erpr

otei

n,su

bfam

ily

lA,

I-

4+

519

ZN

F71

3Z

inc

fing

erpr

otei

n71

3K

RA

B6

+43

0

L0C

4423

11

Sim

ilar

tozi

ncfi

nger

prot

ein

43-

12+

392

4’L

0C22

2032

Sim

ilar

tozi

ncfi

nger

prot

ein

208

--

--

ZN

F47

9Z

inc

ling

erpr

otei

n47

9K

RA

G10

-87

8

4’L

0C34

0223

Sim

ilar

tozi

ncfi

nger

prot

ein

479

--

--

ZN

F71

6Z

inc

ling

erpr

otei

n71

6K

RA

B12

+49

5

ZN

F67

9Z

inc

fing

erpt

otei

n67

9K

RA

B9

+41

1

L0C

7289

27S

imil

arto

zinc

fing

erpr

otei

n92

KR

AB

9+

438

ZN

F68O

Zin

cfi

nger

prot

ein

680

KR

AB

12-

530

ZF

D25

Zin

cli

nger

prot

ein

(ZFD

25)

(ZN

F588

)-

24+

783

ZN

F13

8Z

inc

ling

erpr

otei

n13

6-

6+

262

ZN

F27

3Z

inc

ling

erpr

otei

n27

3K

RA

B13

÷50

4

ZN

F11

7Z

inc

fing

etpr

otei

n11

7K

RA

B9

-38

3

H-p

lKK

rupp

el-r

elat

edzi

ncfi

nger

prot

ein

-13

-48

3

ZN

F92

Zin

cli

nger

prot

ein

92K

RA

B14

+58

6

ZN

F78

9Z

inc

ling

erpr

otei

n78

9K

RA

B8

+42

5

ZN

F39

4Z

inc

fing

erpr

otei

n39

4SC

AN

-KR

AB

7-

561

ZF

P95

Zin

cli

nger

prot

ein

95ho

mol

og(m

ouse

)SC

AN

-KR

AB

13+

639

ZN

F65

5Z

inc

fing

erpr

otei

n65

5-

6+

491

ZN

F49

8Z

inc

fing

erpr

otei

n49

8-

7+

544

ZK

SCA

N1

Zin

cli

nger

prot

ein

36K

RA

B6

+56

3

ZN

F38

Zin

cfi

nger

prot

ein

38K

RA

B8

+47

3

ZN

F3Z

inc

fing

erpr

otei

n3

KR

AB

8-

410

L0C

6436

41H

ypot

heti

cal

prot

ein

L0C

6436

41K

RA

B-K

RA

B-

+11

63

L0C

6497

46H

ypot

heti

cal

prot

ein

L0C

6497

46K

RA

B3

+34

5

L0C

5676

41H

ypot

heti

cal

prot

ein

L0C

5676

41K

RA

B4

+26

7

ZN

F27

7Z

inc

fing

erpr

otei

n(C

2H2

type

)27

7-

2+

436

FEZ

F1Z

inc

fing

erpr

otei

nFE

Z-

5-

471

ZN

F78

6Z

inc

fing

erpr

otei

n78

6K

RA

B15

-78

2

5426

140

5829

273

6428

035

6435

920

6700

537

6696

844

4197

0196

5041

1724

5575

4540

5630

8323

5666

1026

5719

1263

5732

0482

5750

1998

6334

6841

6342

9463

6361

7697

6376

3946

6389

2241

6400

1101

6407

5795

6408

9025

6447

6203

9890

8451

9892

8790

9894

0232

9899

3981

9905

2507

9945

1155

9948

5353

9951

7266

9950

0406

9980

0106

1000

0110

6

1116

3367

9

1217

2928

7

1483

9751

3

5428

044

5853

913

6437

162

6467

482

6713

023

6713

094

4224

1712

5043

8053

5578

2642

5630

9501

5670

8371

5721

1513

5734

3922

5753

3597

6336

4744

6344

6960

6366

0923

6380

8839

6393

1140

6402

8773

6408

8849

6409

0719

6450

3433

9892

3153

9893

5813

9896

9380

9901

2012

9906

7976

9947

3339

9950

0599

9949

9406

9950

0106

9980

1106

1000

0310

6

1117

7032

0

1217

3183

5

1484

1872

0

95T

adep

ally

et.a

I

Page 114: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

)

8p23

.3

8p23.l

8p23.l

8p23.l

6p23.l

8p21

.1

8q11

.1

8q21

.11

8q1

3-q2

1.1

8q

22

.2

8q

22

.2

8q

23

8q24

.12

8q24

.12

8q24

.13

8q24

.13

8q24

.13

8q24

.22

8q24

.3

8q24

.3

8q24

.3

8q24

.3

8q24

.3

8q24

.3

8q

24

.3

8q

24

.3

6q

24

ZN

F59

6

L0C

2832

02

ZN

F7O

5Bq)

ZN

F7O

5C

L0C

441

341

ZN

F39

541

L0C

3922

15

ZFH

X4

ZB

TB

1O

L0C

4423

92

KLF

1O

ZF

PM

2

TR

PS

14)

L0C

3922

64

ZH

X2

ZHX

1

ZN

F57

2

ZN

F4O

6

ZF

P4I

GL

I4

ZN

F69

6

ZN

F62

3

ZN

F7O

7

ZN

F251

ZN

F34

ZN

F51

7

ZN

F7

Zin

cfi

nger

prot

ein

596

Sim

ilar

tozi

ncfi

nger

prot

ein

75

Zin

cfi

nger

prot

ein

705B

Zin

cfi

nger

prot

ein

705C

Sim

ilar

tozi

ncfi

nger

prot

ein

10

Zin

cfi

nger

prot

ein

395

Sim

ilar

tozi

ncli

nger

prot

ein

92

Zin

cfi

nger

hom

eodo

mai

n4

Zin

cfi

nger

and

BT

Bdo

mai

nco

ntai

ning

70

Sim

ilar

tozi

ncli

nger

prot

ein

317

Kru

ppet

-Iik

efa

ctor

10

zinc

fing

erpr

otei

n,m

ultit

ype

2

Zin

cfi

nger

tran

scri

ptio

nfa

ctor

TR

PS

1

Sim

ilar

tozi

ncfi

nger

prot

ein

532

Zin

cfi

nger

san

dho

meo

boxe

s2

Zin

cfi

nger

san

dho

meo

boxe

s1

Zin

cfi

nger

prot

ein

572

Zin

cfi

nger

prot

ein

406

Zin

cfi

nger

prot

ein

41ho

mol

og

GL

t-K

rupp

elfa

mily

mem

ber

GL

I4

Zin

cfi

nger

prot

ein

696

Zin

cli

nger

prot

ein

623

Zin

cfi

nger

prot

ein

707

Zin

cfi

nger

prot

ein

251

Zin

cfi

nger

prot

ein

34

Zin

cfi

nger

prot

ein

517

Zin

cfi

nger

prol

ein

7

19-

752

8+

642

5+

671

4+

495

4+

546

9-

831

4-

644

12-

595

12-

595

11+

504

--

178

-

+30

0

-+

-

-+

301

1-

513

-+

-

7+

3577

2+

847

3+

293

3-

480

2+

1151

1-

1281

1+

837

1-

873

12+

529

13-

1243

4+

198

7+

376

9+

374

13+

536

7+

369

7-

293

12-

549

10+

492

14+

686

1723

8218

7339

7188

906

7226

546

7821

316

7856

224

1223

7465

1226

3633

1221

4635

1226

1425

2825

9021

2829

9896

4781

5662

4781

6648

7777

8835

7794

0711

8156

1003

8159

5322

9457

6479

9472

8501

1037

3018

810

3737

128

1064

0032

310

6885

943

1164

8990

011

6750

402

1205

6169

612

0612

050

1238

6308

212

4055

936

1243

2987

712

4355

728

1260

5473

312

6060

809

1355

5921

313

5794

463

1444

0048

414

4416

250

1444

2098

214

4430

476

1444

4497

114

4451

539

1448

0297

314

4809

731

1448

3865

014

4849

515

1459

4814

014

5952

607

1459

6930

914

5981

498

1459

9506

514

6006

265

1460

2380

714

6039

409

77q

36.1

77q

36.1

77q

36.1

77q

36.1

77q

36.1

77q

36.1

77q

36.1

77q

36.1

77q

36.1

ZN

F42

5

ZN

F39

8

ZN

F28

2

ZN

F21

2

ZN

F78

3

ZN

F77

7

ZN

F74

6

ZN

F76

7

ZN

F46

7

Zin

cli

nger

prot

ein

425

Zin

cfi

nger

prot

ein

398

Zin

cfi

nger

prot

ein

282

Zin

cfi

nger

prot

ein

212

Zin

cfi

nger

prot

ein

783

Zin

cfi

nger

prot

ein

777

Zin

cfi

nger

prot

ein

746

Zin

cfi

nger

prot

ein

767

Zin

cfi

nger

prot

ein

467

1484

3080

9

1484

5444

1

1485

5319

9

1485

6770

7

1485

9019

5

1487

5939

4

1488

0081

8

1488

7517

8

1490

9238

5

1484

5431

1

1485

1105

2

1485

5426

7

1485

8363

0

1486

2532

5

1487

8330

6

1488

2572

7

1489

5275

7

1491

0122

8

7.7b

7.7c

7.7d

7.7e

7-7f

77g

7.7h

7.71

7.7j

8.l

a

8.l

b

8.2a

8.2b

8.3a

8.3b

8.3c

8.3d

8.3e

8.4a

8.4b

8.4c

8.4d

8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 B 8 8

KR

AB

KR

AB

KR

AB

KR

AB

KR

AB

KR

AB

KR

AB

KR

AB

KR

AB

KR

AB

HO

ME

O-4

BT

B

HO

ME

O-3

HO

ME

O-4

KR

AB

KR

AB

KR

AB

KR

AB

96T

adep

ally

etal

Page 115: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

D

88q24.3

88q24.3

88q24.3

99p24.2

99p

13.1

99p13.2

99

p1

2

99

p1

2

99q

22.3

1

99

q2

2.3

2

99

q2

2.3

2

99

q2

2.3

3

9q

22

.33

99

q2

2.3

3

99q

22-q

31

99q31.2

99q

31

99

q3

1.3

99

q3

2

99

q3

2

99

q3

2

99

q3

3.2

99q33.2

99q

33-q

34

99

q3

4

99

1010

p15

10lO

pll

.2

10lO

pli

1010

p11

10lO

pll

.21

10lO

pIl

.2

10lO

pIl

.2

10lO

qll

.21

10lO

qll

.21

8.4e

84f

84g

9.l

a

9.l

b

9.lc

9.l

d

9.2a

9.2b

9.33

9.3b

9.4a

9.4b

10

.la

bib

10.l

c

laid

101e

101f

10.2

a

10.2

b

ZN

F64

7Z

inc

fing

erpt

otei

n64

7(Z

NF2

5O)

KR

AB

13-

560

1460

7733

714

6097

632

ZN

F16

Zin

cfi

nger

prot

ein

16-

17-

682

1461

2654

814

6147

078

L0C

6429

14S

imil

arto

zincf

inger

pro

tein

135

KR

AB

10-

335

1461

7260

314

6196

311

GL

IS3

GL

ISfa

mily

zinc

fing

er3

-6

-77

538

1767

641

4218

3

4)L

0C65

3501

Sim

ilar

tozi

ncfi

nger

prot

ein

658

--

--

3943

3814

3935

4471

ZB

TB

5Z

inc

fing

eran

dB

TB

dom

ain

cont

aini

ng5

BT

B2

-67

737

4281

1137

4553

96

ZN

F65

8Z

inc

fing

erpr

otei

n65

8K

RA

B21

-10

5940

7614

1240

7820

63

ZN

F65

8BZ

inc

fing

erpr

otei

n65

8B-

21-

819

4157

8833

4158

2207

ZN

F48

4Z

inc

fing

erpr

otei

n48

4K

RA

B18

-81

694

6481

7294

6801

11

ZN

F16

9Z

inc

fing

erpr

otei

n16

9K

RA

B13

+60

396

0808

6196

1035

45

ZN

F36

7Z

inc

fing

erpr

otei

n36

7-

2-

350

9819

0057

9622

0490

ZN

F51O

Zin

cfi

nger

prot

ein

510

KR

AB

10-

683

9855

7968

9858

0149

ZN

F78

2Z

inc

fing

erpr

otei

n78

2K

RA

B14

-69

998

6190

9498

6562

10

ZN

F32

2BZ

inc

fing

erpr

otei

n32

28-

11-

402

9899

9358

9900

1731

ZN

F18

9Z

inc

fing

etpr

otei

n18

9K

RA

B16

+62

610

3200

984

1032

1276

3

ZN

F46

2Z

inc

fing

erpr

otei

n46

2-

9+

2506

1086

6519

910

8813

628

KL

F4K

wpp

el-l

ike

fact

or4

-3

-47

010

9286

956

1092

9157

6

ZN

F48

3Z

inc

fing

erpr

otei

n48

3SC

AN

-KR

AB

-+

256

1133

2726

011

3379

945

L0C

1698

34hy

poth

etic

alpt

otei

nL

0C16

9834

-13

-53

011

4799

221

1148

1429

3

ZF

P37

Zin

cfi

nger

prot

ein

37ho

mol

og(m

ouse

)K

RA

B12

-63

011

4843

995

1148

5881

7

ZN

F61

8Z

inc

fing

erpr

otei

n61

8-

4+

861

1156

7838

011

5852

293

ZN

F48

2Z

inc

fing

erpr

otei

n48

2B

TB

4-

424

1247

1015

012

4715

430

ZB

IB26

Zin

cfi

nger

and

BIB

dom

ain

cont

aini

ng26

BT

B4

-44

112

4720

199

1247

3360

0

ZN

F29

7BZ

inc

fing

erpr

otei

n29

7BB

TB

467

1286

0718

212

8637

318

ZN

F79

Zin

cfi

nger

prot

ein

79K

RA

B11

+49

812

8662

244

1266

8797

9

ZB

TB

34Z

inc

fing

eran

dB

TB

dom

ain

cont

aini

ng34

BT

B3

+53

212

9226

482

1292

4747

1

KL

F6K

wpp

el-

like

fact

or6

-3

-28

338

0818

838

1745

5

ZN

F24

8Z

inc

fing

erpr

otei

n24

8K

RA

B8

-57

938

1579

0538

1864

92

4)B

A77

5A3.

1K

RA

Bbo

xzi

ncfi

nger

pseu

doge

ne-

-+

-38

2125

5336

2130

31

4)B

A39

3]16

.4Z

inc

fing

erpse

udogen

e-

-+

-38

2235

2638

2256

69

ZN

F25

Zin

cfi

nger

prot

ein

25K

RA

B12

-45

638

2788

0138

3007

44

ZN

F33

AZ

inc

fing

erpr

otei

n33

aK

RA

B16

+81

038

3412

7638

3883

10

ZN

F37

AZ

inc

fing

erpr

otei

n37

aK

RA

B12

+56

138

4232

8138

4522

86

L0C

4016

42S

imil

arto

zinc

fing

ecpr

otei

n91

-16

-59

742

1515

3442

1727

13

ZN

F37B

Zin

cfi

nger

prot

ein

37b

KR

AB

8-

525

4236

7418

4236

8286

97T

adep

ally

et.a

I

Page 116: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

10lO

qll

.2

10lO

qll

.21

10lO

qll

.22-q

ll.2

3

10lO

qll

.21

1010

q22-

q25

10lO

qil

10lO

qll

.22

1010

q21.

2

1010

q22.

2

1010

q22.

3

1010

q24.

1

1010

q26

1010

q26.

3

llp

l5.5

I1p

15.4

llp

l5A

11p1

5.4

llp

l4.3

11p1

3

llpll

.2

11q1

2

11q1

2.2

11q1

2.3

11q1

3.4

11q2

3.1

11q2

3.3

11q2

4.3

12l2

p12

1212

p13.

31

1212

q13

1212

q13.

11

1212

q13.

13

1212

q13

1212

q13.

2-q1

3.3

ZN

F1I

B

ZN

F48

7

ZN

F23

9

ZN

F48

5

ZN

F32

ZN

F22

ZN

F48

8

ZN

F36

5

ZN

F5O

3

4)L

0C39

9783

ZN

F5J

8

ZN

FN

IA5

ZN

F51

1

ZN

F19

5

ZN

F21

5

ZN

F2J

4

ZN

F14

3

L0C

341

002

4)Z

NF

ZN

F4O

8

ZFP

91

ZF

P91

-CN

TF

ZB

TB

3

L0C

4400

53

ZN

F75

C

ZB

TB

16

ZN

F2O

2

ZB

TB

15

ZN

F38

4

ZN

F7O

5A4)

ZN

F75B

ZN

F64

J

ZN

F38

5

ZN

FN1

A4

Zin

cfi

nger

prot

ein

11b

Zin

cfi

nger

prot

ein

487

Zin

cfi

nger

prot

ein

239

Zin

cfi

nger

prot

ein

485

Zin

cli

nger

prot

ein

32

Zin

cfi

nger

prot

ein

22

Zin

cfi

nger

prot

ein

488

Zin

cli

nger

prot

ein

365

Zin

cli

nger

prot

ein

503

Sim

ilar

tozi

ncfi

nger

prot

ein

532

Zin

cfi

nger

prot

ein

518

zinc

ling

erpr

otei

n,su

bfam

ily

lA,

5

zinc

fing

erpr

otei

n51

1

zinc

fing

erpr

otei

n19

5

zinc

fing

erpr

otei

n21

5

zinc

fing

erpr

otei

n21

4

zinc

fing

erpr

otei

n14

3(c

lone

pHZ

-1)

sim

ilar

todJ

568F

9.1

(zin

cli

nger

prot

ein

133

Kw

ppel

like

zinc

ling

erpr

otei

n

zinc

fing

erpr

otei

n40

8

zinc

fing

erpr

otei

n91

hom

olog

(mou

se)

zinc

fing

erpr

otei

n91

hom

olog

(mou

se),

CN

F

zinc

ling

eran

dB

IBdo

mai

nco

ntai

ning

3

sim

ilar

tozi

ncfi

nger

prot

ein

596

Zin

cfi

nger

prot

ein

75C

zinc

fing

eran

dB

TB

dom

ain

cont

aini

ng16

zinc

ling

erpr

otei

n20

2

BT

BfP

OZ

)do

mai

nco

ntai

ning

15

zinc

fing

erpr

otei

n38

4

Zin

cfi

nger

prot

ein

705A

zinc

fing

erpr

otei

n75

b

Zin

cfi

nger

prot

ein

641

zinc

fing

erpr

otei

n38

5

zinc

fing

erpr

otei

n,su

bfam

ily

lA,

4

Gli

oma-

asso

ciat

edon

coge

neho

mol

og

16-

778

3+

421

9-

458

11+

402

7-

273

224

2+

340

-+

462

1-

646

-+

-

4+

1483

4-

420

3+

252

10-

557

4+

517

11-

606

626

15+

653

10+

720

4+

570

-

+52

9

BTB

2-

574

KR

AB

--

178

390

BTB

9+

673

SCA

N-K

RA

B8

-64

8

BT

B2

+53

9

6-

516

5+

300

5-

438

3-

366

4+

544

4240

4561

4245

3998

4325

2288

4329

8636

4337

1801

4338

3913

4342

1881

4343

3358

4345

9313

4346

4332

4481

5928

4482

0780

4797

5095

4799

3872

6380

3957

6410

1777

7682

7915

7683

1431

7914

9361

7916

4538

9787

9494

9791

2480

1247

4195

512

4758

311

1349

7241

313

4976

656

DD

10.2

c

10.3

a

10.3

b

10

3c

10.3

d

11.l

a

11.l

b

11.2

a

11.2

b

KR

AB

KR

AB

KR

AB

KR

AB

SCA

N-K

RA

B

KR

AB

KR

AB

11 11 11 11 11 il 11 11 11 H1

11 11 11 11 11

3336

572

6904

230

6977

125

9439

089

2376

7378

3236

5900

4667

8944

5810

3225

5810

3225

6227

5011

7119

6287

7777

4116

1134

3565

9

1231

0020

7

1296

0178

9

6645

904

8216

417

4268

7843

4702

2179

5304

9187

5470

4791

3356

891

6935

854

6998

117

9506

188

2376

8482

3241

3653

4668

4037

5814

5091

5814

9778

6227

8190

7123

2113

7777

4385

1136

2660

8

1231

1757

3

1296

8971

7

6668

930

8223

909

4269

0385

4703

0844

5306

4748

5471

7887

GLI

KR

AB

KR

AB

5+

1106

1561

4020

156

1523

12

98T

adep

ally

et.a

I

Page 117: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

NIIIIHI VIII INr r r r r r r r

INW rrwflr

r r r r r r r r I I I0 1% r

+ .+++ I++ + 11+1 II I

-fl0 r• r r r r C r CI CI CI q q r r Ø

I

I9III I’IiiI’9uni’ iuv’uiii; ,-n

m.sin I!gIIWaII. I++ I + 1+11 I+++++

riNflCCt.. CICr .

I I I I I I I I I I I I.111111.

Ilillili“Ii”

IIj)I

IIII

‘liii11111

I ILr

jII r

CI,

110 0

lilitIfil IIIIIItiIiiiHHuI IIIflIpI)

iIh sIINhr

3.

•nut•%..o anan en en enr r r r r r r r r r r q q r rWclrJcJcJcleJ .Scdnn 44 qq

I rrrrrrr rrrr I I I rr I rr I I I I I I I rrrrrrr

qCI

rr r r qqrlqq % r I rr

Ihnn Hill nflflihr r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r

cflCdtfl CICICICICI qqqqqqtqtq 00000000ccor r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r

Page 118: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

ZN

F77

4

ZN

F59

8

ZN

F2O

6

ZN

F2O

5

ZN

F21

3

ZN

F20

0

ZN

F26

3

ZN

F75

A

ZN

F43

4

ZN

F17

4

ZN

F59

7

GL

IS2

ZN

F50

0

ZN

F69

4

ZN

F55

3

ZN

F76

8

ZN

F74

7

ZN

F76

4

ZN

F68

8

ZN

F78

5

ZN

F68

9

ZN

F62

9

ZN

F66

8

ZN

F64

6

L0C

3424

26

ZN

F26

7

ZN

F42

3

ZN

F31

9

ZN

F23

ZN

F19

ZFP

1

ZN

F46

9

ZFP

M1

ZF

P27

6

ZF

P3

Zin

cfi

nger

prot

ein

774

zinc

fing

erpr

otei

n59

8

zinc

fing

etpr

otei

n20

6

zinc

fing

erpr

otei

n20

5

zinc

fing

erpr

otei

n21

3

zinc

fing

erpr

otei

n20

0

zinc

fing

erpr

otei

n26

3

Zin

cfi

nger

prot

ein

75a

Zin

cfi

nger

ptot

ein

434

zinc

fing

erpr

otei

n17

4

zinc

fing

erpr

otei

n59

7

GL

ISfa

mily

zinc

fing

et2

zinc

fing

erpr

otei

n50

0

Zin

cfi

nget

prot

ein

694

zinc

fing

erpr

otei

n55

3

Zin

cfi

nger

prot

ein

768

Zin

cfi

nger

prot

ein

747

Zin

cfi

nger

prot

ein

764

Zin

cfi

nger

prot

ein

688

Zin

cfi

nger

prot

ein

785

Zin

cfi

nger

prot

ein

HIT

-39

(ZN

F68

9)

zinc

fing

etpr

otei

n62

9

Zin

cli

nger

prot

ein

668

Zin

cfi

nger

prot

ein

646

sim

ilar

tozi

ncfi

nger

prot

ein

267

zinc

fing

erpr

otei

n26

7

zinc

fing

erpr

otei

n42

3

zinc

fing

erpr

otei

n31

9

Zin

cfi

nger

prot

ein

23(K

OX

16)

Zin

cfi

nger

prot

ein

19(K

OX

12)

zinc

fing

erpr

otei

n1

hom

olog

(mou

se)

zinc

fing

erpr

otei

n46

9

zinc

fing

erpr

otei

n,m

ultit

ype

1

ZN

F27

6ho

mol

og(m

ouse

)

zinc

fing

erpr

otei

n3

hom

olog

(mou

se)

-12

+48

3

--

-90

4

SCA

N14

-72

5

KR

AB

8f-

554

SCA

N-K

RA

B5

+45

9

-5

-39

5

SCA

N-K

RA

B9

+68

3

KR

AB

5+

296

-6

-48

5

SCA

N3

+40

7

-7

-42

4

-4+

524

SCA

N5

-48

0

SCA

N-K

RA

B6

-96

7

-12

+61

8

-10

-52

0

KR

AB

--

191

KR

AB

7-

408

KR

AB

2-

276

KR

AB

7-

405

KR

AB

11-

500

19-

1056

16-

619

29+

1832

13-

580

14+

743

23-

1284

15-

582

17-

643

10-

458

8+

352

3+

3446

2+

1004

4+

539

13+

502

4922

478

D

1515

q26.

115

.3c

D

8869

6546

8870

5719

1616

p13.

3

1616

p13.

3

1616pl3

.3

1616

p13.

3

1616

p13.

3

161

6p

l3.3

1616p13.l

l

1616

p13.

3

16iS

p13.3

1616

p13.

3

1616

p13.

3

1616

p13.

3

1616

p12.

1

1616

p11.

2

1616

p11.

2

16l6

pll

.2

1616pl1

.2

1616

p11.

2

1616

p11.

2

1616

p11.

2

16.l

a

16.1

b

16.1

c

16

.ld

16

1e

161f

161g

16.1

h

16h

16.2

a

16.2

b

16.3

a

16.3

b

16.3

c

16.3

d

16.3

e

16.3

f

163g

16.3

h

16.3

1

16.3

j

16.4

a

16.4

b

16.5

a

16.5

b

16.6

a

16.6

b

16 16 16 16 16 16 16 16 16 16 16 16 16

16p1l.

2

16p1

1.2

i6pll

.2

16p1

1.2

16p1

1.2

1 6q

12

1 6q

13

16q2

2

16q2

2

16q2

2.3

16q2

4

16q2

4.2

16q2

4.3

1987

769

3078

896

3102

607

3125

140

3212

343

3273

488

3295

485

3372

086

3391

245

3426

111

4322

226

4740

816

2515

4823

3031

4558

3044

2826

3045

0280

3047

2586

3048

8508

3049

9495

3052

2187

3069

7271

3097

9672

3099

3269

3152

0510

3163

2096

4808

2022

5658

6074

7003

9000

7006

5563

7373

9926

8702

1380

8704

7226

8831

4934

1999

764

3082

862

3110

519

3132

806

3225

410

3281

461

3308

575

3391

026

3399

365

3433

491

4327

803

4757

167

2517

6343

3031

8216

3044

5411

3045

3695

3047

7085

3049

1229

3050

4511

3052

9183

3070

6024

3099

3005

3100

2334

3152

2350

3168

0365

4841

8419

5659

126

3

7005

3618

7008

0742

7376

3486

8703

4666

8712

8890

6633

3811

4940

393

KR

AB

KR

AB

KR

AB

17l7

p13.2

17.i

a

100

Tad

epal

lyet

.aI

Page 119: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

1717

p13-

p12

1717

p13

1717

p13.

1

17l7

pll

.2

1717

p11.

2

1717p1l.

2

1717

p11.

2

1717

q11.

2

1717

q12

1717

q21

1717

q21.

32

1717

q22

18pt

er-p

l1.

2

18p1

1.21

18q1

1.2

18q1

1.2

18q1

2.2

18q1

2

18q1

2

18q1

2

18q2

1.1

18q2

1.32

18q2

3

18q2

3

18q2

2-q2

3

18q2

3

ZN

F23

2

ZN

F59

4

ZB

TB

4

ZN

FI8

ZN

F28

6

ZN

F28

7

ZN

F62

4

ZN

F2O

7

ZN

F4O

3

ZN

FN1A

3

ZN

F65

2

ZN

F161

ZF

P16

I

ZN

F51

9

L0C

4418

16

ZN

F521

ZN

F39

7

ZN

F271

ZN

F24

ZN

F39

6

APM

-1

ZN

F53

2

ZN

F4O

7

ZN

F51

6

ZN

F23

6

SAL

L3

KL

F16

BT

BD

2

ZN

F55

4

ZN

F55

5

ZN

F55

6

ZN

F57

ZN

F77

KIA

A1

086

ZB

TB

7A

zinc

fing

erpr

otei

n23

2

zinc

ling

erpr

otei

n59

4

zinc

fing

eran

dB

TB

dom

ain

cont

aini

ng4

zinc

ling

erpr

otei

n18

(KO

X11

)

zinc

fing

erpr

otei

n28

6

zinc

ling

erpr

otei

n28

7

Zin

cfi

nger

prot

ein

624

zinc

ling

erpr

otei

n20

7

zinc

fing

erpr

otei

n40

3

zinc

ling

erpr

otei

n,su

bfam

ily

lA,

3

Zin

cli

nger

prot

ein

652

zinc

ling

erpr

otei

n16

1

zinc

ling

erpr

otei

n16

1ho

mol

og(m

ouse

)

zinc

fing

erpr

otei

n51

9

sim

ilar

tozi

ncli

nger

prot

ein

586

zinc

fing

erpr

otei

n52

1

zinc

fing

erpr

otei

n39

7

zinc

fing

erpr

otei

n27

1

zinc

ling

erpr

otei

n24

(KO

X17

)

zinc

ling

erpr

otei

n39

6

BT

B/P

OZ

-zin

cli

nger

prot

ein-

like

zinc

ling

erpr

otei

n53

2

Zin

cfi

nger

prot

ein

407

zinc

ling

erpr

otei

n51

6

zinc

ling

erpr

otei

n23

6

SaI-

like

3

Kru

ppel

-Iik

efa

ctor

16

BTB

dom

ain

cont

aini

ng2

Zin

cfi

nger

prot

ein

554

Zin

cfi

nger

prot

ein

555

Zin

cfi

nger

prot

ein

556

Zin

cli

nger

prot

ein

57

Zin

cfi

nger

prot

ein

77(p

11)

Sim

ilar

tozi

ncli

nger

prot

ein

Zin

cli

nger

and

BIB

dom

ain

cont

aini

ng7

SCA

N5

-44

4

-22

-11

20

BIB

6-

1013

SCA

N-K

RA

B5

-54

9

KR

AB

10+

521

SCA

N-K

RA

B14

-75

4

KR

AB

21-

865

2+

478

--

+69

7

-1

-50

9

-9

-60

6

-5

-52

1

BT

B5

-44

9

KR

AB

10-

540

--

-12

2

-24

-13

11

SCA

N-

+27

5

-5

+423

SCA

N4

-36

6

SCA

N2

-33

3

BIB

4-

766

-8

+13

01

-7

+10

01

-7

-28

52

-25

+1558’

-8

+13

00

D

18 18 18 18 18 18 18 18 18 18 18 18 16 18

17.l

b

17.l

c

f7.2

a

17.2

b

18.l

a

18

.lb

18.l

c

18

.ld

18.2

a

f8.2

b

19.l

aa

19

.lab

19.2

aa

19.2

ab

19.2

ac

19.2

ad

19.2

ae

19.3

aa

19.3

ab

4949

755

5023

554

7303

421

1182

1487

1554

4054

1639

5426

1646

4776

2741

4039

3197

4917

3517

4724

4472

7485

5340

3909

5279

379

1409

4724

2034

1044

2089

5889

3107

5017

3112

4298

3116

9957

3120

0659

4380

7731

5468

1041

7047

4282

7219

8625

7266

5104

7484

1263

1803

399

1936

447

2770

917

2792

482

2818

333

2851

964

2684

217

3755

010

3996

217

4967

121

5028

416

7328

241

1184

1414

1556

1963

1641

3189

1649

7883

2772

1583

3202

0391

3527

3967

4479

4883

4

5342

0614

5283

313

1412

2429

2034

1412

2118

6114

3109

2357

3114

2072

3117

8405

3121

1299

4382

1492

5480

4689

7076

2386

7229

6128

7281

1671

7485

9182

1814

496

1966

702

2786

469

2805

036

2829

501

2869

474

2895

930

3820

026

4017

816

1919

p13.

3

19l9

pl3

.3

1919

p13.

3

1919

p13.

3

1919

p13.

3

19l9

pl3

.3

19l9

p13.3

1919

p13.

3

19l9

p13.3

BT

B

KR

AB

KR

AB

KR

AB

KR

AB

KR

AB

BT

B

3-

252

--

525

7+

487

15+

626

9+

456

13+

555

12-

545

3-

939

4-

584

101

Tad

epal

lyet

.aI

Page 120: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

1919pl3

.219

.4aa

191

9p

l3.3

-pl3

.219

.4ab

191

9p

l3.2

19.5

aa

1919

p13.

219

.5ab

1919

p13

19.5

ac

1919

p13.

219

.5ad

191

9p

l3.2

19.5

ae

1919

p13.

219

.5af

1919

p13.

219

.5ag

1919

p13.

219

.5ah

1919

p13.

219

.5ai

1919

p13.

219

.5aj

191

9p

l3.2

19.5

ak

1919

p13.

219

.5a1

1919

p13.

219

.5am

1919

p13.

219

.5an

1919

p13.

219

.6aa

1919

p13.

219

.Gab

1919

p13.

219

.6ac

1919

p13.

219

.6ad

191

9p

l3.2

19.6

ae

1919

p13.

219

.6af

1919

p13.

21

96

ag

19l9

pl3

.219

.6ah

1919

p13.

219

.6ai

19l9

p1

3.2

19.6

aj

19l9

pl3

.219

.6ak

1919

p13.

219

.6a1

19l9

p1

3.2

19.6

am

19l9

p1

3.2

19.6

an

1919

q13.

4319

.6ao

19l9

p13.3

-p13.2

19.6

ap

1919

p13.

219

.6aq

19l9

pl3

.2-p

13.1

219

.6ar

19l9

p1

3.2

19.6

as

19l9

p1

3.2

19.6

at

19l9

p1

3.2

19.6

au

ZN

F55

7Z

inc

fing

erpr

otei

n55

7K

RA

B10

+43

0

ZN

F35

8Z

inc

fing

erpr

otei

n35

8-

9+

481

ZN

F41

4Z

incf

ing

erp

rote

in4

l4-

1-

312

ZN

F55

8Z

inc

fing

erpr

otei

n55

8K

RA

B9

-40

2

ZN

F3I

7Z

inc

fing

etpr

otei

n31

7K

RA

B13

+59

5

ZN

F69

9Z

inc

fing

erpr

otei

n69

9K

RA

B16

-64

2

ZN

F55

9Z

inc

fing

erpr

otei

n55

9K

RA

B11

+53

8

ZN

F17

7Z

inc

fing

erpr

otei

n17

7K

RA

B7

+32

1

ZN

F26

6Z

inc

fing

erpr

otei

n26

6K

RA

B14

-54

9

ZN

F56O

Zin

cfi

nger

prot

ein

560

KR

AB

-KR

AB

14-

790

ZN

F42

6Z

inc

fing

erpr

otei

n42

6K

RA

B12

-55

4

ZN

FJ2

1Z

inc

fing

erpr

otei

n12

1-

10-

390

ZN

F56

IZ

inc

fing

erpr

otei

n56

1-

10-

417

ZN

F56

2Z

inc

fing

erpr

otei

n56

2-

9-

354

qi

L0C

7296

48S

imil

arto

zinc

fing

erpr

otei

n56

1-

--

-

L0C

1629

93H

ypot

heti

cal

prot

ein

L0C

1629

93K

RA

B12

-53

3

ZN

F65

3Z

inc

fing

etpr

otei

n65

3-

4-

615

ZN

F62

7Z

inc

fing

erpr

otei

n62

7K

RA

B11

+46

1

LO

C4O

189

8S

imil

arto

hypo

thet

ical

prot

ein

FL

J382

81-

6+

187

HS

ZF

P36

Zin

cfin

ger

pro

tein

ZF

P-3

6K

RA

B16

-61

0

ZN

F44

JZ

inc

fing

erpr

otei

n44

1-

19+

626

ZN

F49

JZ

incf

inger

pro

tein

49l

-13

+43

7

ZN

F44O

Zin

cfi

nger

prot

ein

440

KR

AB

12+

595

ZN

F43

9Z

inc

fing

erpr

otei

n43

9K

RA

B11

+49

9

ZN

F69

Zin

cfi

nger

prot

ein

69K

RA

B-

÷14

9

ZN

F70

0Z

inc

fing

erpr

otei

n70

0K

RA

B21

+74

2

ZN

F44O

LZ

inc

fing

erpr

otei

n44

0lik

eK

RA

B8

+39

7

ZN

F43

3Z

inc

fing

erpr

otei

n43

3K

RA

B19

-67

3

L0C

7297

47S

imil

arto

zinc

fing

erpr

otei

n70

9K

RA

B15

-57

8

FL

J149

59H

ypot

heti

cal

prot

ein

FL

]149

59K

RA

B8

+66

6

ZN

F78

8H

ypot

heti

cal

prot

ein

L0C

3885

07-

16+

615

ZN

F2O

Zin

cfi

nger

ptot

ein

20(K

OX

13)

KR

AB

13-

536

ZN

F62

5Z

inc

fing

erpr

otei

n62

5-

8-

306

ZN

F13

6Z

inc

fing

erpr

otei

n13

6K

RA

B14

+54

0

ZN

F44

Zin

cfi

nger

prot

ein

44(K

OX

7)K

RA

B16

-63

7

ZN

F56

3Z

inc

fing

erpr

otei

n56

3K

RA

B8

-47

6

ZN

F44

2Z

inc

fing

erpr

otei

n44

2K

RA

B14

-62

7

7020

721

7487

075

8485

032

8781

382

9112

073

9265

957

9295

928

9334

696

9384

272

9438

003

9499

683

9537

292

9580

131

9620

341

9661

814

9729

151

1145

5246

1156

9327

1161

1591

1169

3080

1173

8907

1177

0400

1180

1554

1183

7844

1185

9670

1189

6900

1193

6869

1198

6573

1201

5620

1203

6528

1206

4078

1210

3603

1211

6705

1213

4919

1221

9007

1228

9291

1232

1185

7034

589

7491

911

8482

224

8794

565

9135

084

9281

384

9315

521

9353

866

9407

234

9470

279

9510

303

9556

209

9592

899

9632

550

9667

794

9740

410

1147

7654

1159

0974

1162

4258

1171

0731

1175

4301

1178

0306

1180

6031

1184

1306

1188

6144

1192

2578

1195

2214

1199

0116

1202

8127

1204

9631

1208

6499

1211

2116

1212

8529

1216

1064

1226

6637

1230

5502

1233

7447

102

Tad

epal

lyet

.aI

Page 121: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

I

NNNNNNNNNr r r r r r r r r

r r r r r r r r r r r r r r r

r r r r r r r r N NNNNNNNNNCINNNr r r r r r r r r r r r r r r

• I I Il + I + I ++ I I +++ I ++ I I I +++++ • ++ I I I + I I +

eeeenr- O oe F-rCO q NNNC F-NNqNnr r r r fl r • C r r r r r • r r N fl r r r •N r r r r r fl r I n r r C e n

HH I • 1H •HH 1 1 • HHHH

I IIh h

iiiitiitIititItititI{tiitttttttiittff1HIHhkPHHkIHI I,,)J)))),,,,,99

c c c c h h c

I I I I1 I I I I 2 2 2!! 2 11IqiqMRwwq

3.

4fl11IIe e e e e e e e e e e e e e e e e e e e e e e e e e e e â e e e e er r r r r r r I I I r r r r r r r r r r r r r r r r r r r r r r r r r r r

?!! Ndr r N Nr Q Q r r r!tS t,r?r 9 9

NNNNNNr rrru3r r r rnSâNNNNNNNNNN.SNNNNN.N.INnNNN

r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r&&&&&&&&n&&&&&&&&&&&&&&&&&&&&&&&&&&r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r

eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeer r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r

Page 122: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

D

1919

p12

19.7

az

1919

p12

19.7

az

1919

p12

19.7

az

1919

p12

19.7

az

‘19

19p1

219

.7az

1919p13.l

-p12

19.7

az

1919

p12

19,7

az

1919

p12

19.7

az

1919

p12

19.7

az

1919

p12

19.7

az

1919

p12

19.7

az

‘19

19p1

219

.7az

1919

p12

19.7

az

1919

q12

-

1919

q12

-

1919q1311

-

191

9q

13

.1l

19.8

aa

1919

q13.

1119

.8ab

1919

q13.

1119

.8ac

1919

q13.

1119

.8ad

1919

q13.

1119

.8ae

1919

q13.

1119

.8af

1919

q13.

1119

.8ag

1919

q13.

1119

.8ah

1919

q13.

119

.9aa

1919

q13.

1219

.9ab

1919

q13.

119

.9ac

1919

q13.

1219

.9ad

1919

q13.

1219.9

ae

1919

q13.

1219

.9af

1919

q13.

1219

.9ag

1919

q13.

131

99

ah

1919

q13.

1219

.9ai

1919

q13.

1219

.9aj

1919

q13.

1219

.9ak

1919

q13.

1219

.9a1

f19

q13.

121

9.9

am

ZN

F49

2Z

inc

fing

erpr

otei

n49

2K

RA

B28

+11

3222

6089

6622

6423

12

ZN

F99

Zin

cfi

nger

prot

ein

99K

RA

B30

-10

3622

7308

4722

7446

24

L0C

6468

64S

imil

arto

zinc

fing

erpr

otei

n43

0K

RA

B7

+65

122

7817

7422

8330

75

4)L

0C38

8523

Sim

ilar

toZ

inc

fing

erpr

otei

n20

8-

--

-22

9494

5122

9778

18

4)Z

NF

724P

Sim

ilar

tozi

ncfi

nger

prot

ein

43-

--

-23

1968

7323

2568

59

ZN

F91

Zin

cfi

nger

prot

ein

91(H

PF

7,H

TF

J0)

KR

AB

35-

1191

2333

3876

2337

0089

4)Z

NF

725

Zin

cfi

nger

prot

ein

725

--

--

2346

6157

2349

0947

ZN

F67

5Z

incf

inger

pro

tein

675

(hZ

)K

RA

B14

-56

823

6278

1223

6617

82

ZN

F68

IZ

inc

fing

erpr

otei

n68

1-

16-

576

2371

8000

2373

3479

L0C

6468

95S

imil

arto

zinc

fing

erpr

otei

n53

9-

6+

193

2380

6558

2380

7139

L0C

7300

84S

imil

arto

zinc

fing

erpr

otei

n53

9-

6+

171

2380

7338

2380

7853

4)L

0C73

0087

Sim

ilar

tozi

ncf

ing

erp

rote

in91

--

+-

2388

9628

2391

0116

ZN

F25

4Z

incf

ing

erp

rote

in2

54

KR

AB

4+

353

2406

1816

2410

3022

ZN

F53

6Z

inc

fing

erpr

otei

n53

6-

8+

1300

3555

5166

3574

0805

ZN

F53

7Z

inc

fing

erpr

otei

n53

7H

OM

EO

2-

1081

3645

7693

3646

2015

ZN

F5O

7Z

incf

inger

pro

tein

507

-5

+95

337

5283

9337

5704

13

L0C

4418

47S

imil

arto

zinc

fing

erpr

otei

n23

9-

9-

345

3981

7307

3981

8566

ZN

F3O

2Z

inc

fing

erpr

otei

n30

2K

RA

B7

+47

839

8604

3939

8691

37

ZN

F181

Zin

cfi

nger

prot

ein

181

(HH

Z18

1)K

RA

B11

+50

739

9169

3339

9256

13

ZN

F59

9Z

incf

inger

pro

tein

599

KR

AB

14-

588

3994

0819

3995

5960

L0C

6438

25S

imil

arto

zinc

fing

erpr

otei

n39

6SC

AN

417

3998

5078

4000

8745

L0C

441

848

Sim

ilar

tozi

ncfi

nger

prot

ein

113

-1

-58

740

0491

0140

0496

76

ZN

F3O

Zin

cfi

nger

prot

ein

30(K

OX

28)

-18

÷54

240

1097

2440

1279

12

ZN

F92

Zin

cfin

ger

pro

teïn

92-

13-

553

4013

9098

4014

3044

TZ

FP

Tes

tjs

zinc

fing

erpr

otei

nB

TB

487

4089

5670

4089

9780

ZN

F56

5Z

inc

fing

erpr

otei

n56

5K

RA

B12

-49

941

3648

8941

3847

92

ZN

F14

6Z

inc

fing

etpr

otei

n14

6-

10+

292

4141

1488

4142

1506

ZF

P14

Zin

cfï

nger

prot

ein

14-l

ike

KR

AB

13-

533

4155

0715

4151

9002

ZN

F54

5Z

incf

inger

pro

tein

545

KR

AB

13-

532

4157

4701

4160

1390

ZN

F56

6Z

inc

fing

erpr

otei

n56

6K

RA

B7

-41

841

6304

1541

6593

58

ZF

P26

OZ

incf

ing

erp

rote

in2

6o

-13

-41

241

6937

7041

7110

12

ZN

F52

9Z

inc

fing

erpr

otei

n52

9-

11-

458

4172

7130

4175

6030

ZN

F38

2Z

inc

fing

erpr

otei

n38

2K

RA

B10

+55

041

7880

6141

8113

39

GIO

T-1

Gon

adot

ropi

nin

duci

ble

TR

FK

RA

B12

-56

341

8201

2341

8495

79

ZN

F56

7Z

inc

fing

erpr

otei

n56

7K

RA

B15

+61

641

8721

4241

9040

66

L0C

3428

92H

ypot

heti

cal

prot

ein

L0C

3428

92K

RA

B32

-10

9041

9305

0941

9555

71

MG

C62

100

Hyp

othe

tica

lpr

otei

nL

0C38

8536

KR

AB

13-

636

4199

6710

4202

1121

Tad

epal

lyet

.aI

Page 123: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

I I +++++ I I ++

t-Or CCOCNr r r C n r r r r r

.LHHHHHHI1 iL •1

C

IrorSIU. nrij 3 rn- rfl

p j

tItIIIiItttiItItttIItIIIIIttttItIIItI111 I1)H)Hh11u1u)

C nro

uxiiniiuxs inooodddddooodo oooooooooooooododooooor r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r

nrnrV r

CINC4CIC4CUNN C4NNNNnC4 Na rrrr Yr rr r r r r r r r r r r r r r r r r r r tU tU tU tU (bi n q.deddd.5oi.SuSnr r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r rgggggggggggggggggggggggggggggggggggggr r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r

oooooooooooooooooooooooooooooooooooor r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r

IrrCCn*%flr flP4rrFJ1

gma

Cr0

ImdIiIIihhIhi++++ I I +++ I +++ I I I I I ++

C tU tU O n n r n (bi C r t- t- n 00r r r r tU N r r r r r IØ r r r n N r C SI

t-

ii’’’’INIII““n++++++

000*00r r r r r r

Page 124: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

D

1919

q13.

3119.l

lap

ZN

F23

4Z

inc

fing

erpr

otei

n23

4K

RA

B19

+70

049

3376

184

93

54

12

7

1919q13.2

19.l

laq

ZN

F22

6Z

inc

fing

erpr

otei

n22

6K

RA

B19

+80

349

3610

894

93

73

67

8

1919q13.3

219.l

lar

ZN

F22

7Z

inc

fing

erpr

otei

n22

7K

RA

B19

+79

949

4085

314

94

33

26

0

1919

q13.

311

9.l

las

ZN

F23

3Z

inc

fing

erpr

otei

n23

3K

RA

B8

+67

049

4559

164

94

71

30

8

191

9q

13

.219.l

lat

ZN

F23

5Z

inc

fing

erpr

otei

n23

5K

RA

B16

-73

849

4624

404

95

01

01

0

1919

q13.

219

.11

auZ

NF

228

Zin

cfi

nger

prot

ein

228

KR

AB

17-

907

4952

2546

49

55

26

66

1919q13.3

219.l

lav

ZN

F28

5Z

inc

fing

etpr

otei

n28

5K

RA

B11

-59

049

5816

484

95

97

60

5

1919

q13.

3119

.1la

wP

L0C

1477

11

Sim

ilar

tozi

ncfi

nger

prot

ein

285

--

+-

4965

4547

4966

9406

191

9q

13

.219

.1la

xZ

NF1

8OZ

inc

fing

erpr

otei

n18

0(H

HZ

J68)

KR

AB

12-

692

4967

1701

49

69

63

95

191

9q

13

.2-

ZN

F34

2Z

inc

fing

erpr

otei

n34

2-

6-

475

5026

6599

5027

1528

1919q13.3

2-

ZN

F54

1Z

inc

fing

erpr

otei

n54

1-

2-

792

5271

5759

5273

9966

191

9q

13

.32

-Z

NF

114

Zin

cfin

ger

pro

tein

114

KR

AS

4+

417

5346

6466

5348

2675

1919q13.3

3-

ZN

F47

3Z

inc

fing

erpr

otei

n47

3K

RA

B20

+87

155

2210

2455

2438

45

191

9q

13

.419.l

2aa

ZN

F17

5Z

inc

fing

erpr

otei

n17

5K

RA

B15

+71

156

7663

4356

7848

03

1919

q13.

411

9.l

2ab

ZN

F57

7Z

inc

fing

erpr

otei

n57

7K

RA

B8

-47

857

0663

6557

0630

09

1919

q13.

4119.l

2ac

ZN

F64

9Z

inc

fing

erpr

otei

n64

9K

RA

B10

-50

557

1000

5957

0843

01

1919

q13.

411

9.l

2ad

4L

0C44

1861

Sim

ilar

tozi

ncf

ing

erp

rote

in64

--

--

5710

9726

5711

2852

1919

q13.

4119.l

2ae

ZN

F61

3Z

inc

fing

erpr

otei

n61

3K

RA

B12

+58

157

1225

0057

1408

17

1919

q13.

411

9.l

2af

ZN

F35O

Zin

cfin

ger

pro

tein

350

KR

AB

8-

532

5715

9406

5718

1880

1919

q13.

411

9.l

2ag

ZN

F61

5Z

incf

inger

pro

tein

615

KR

AB

19-

731

5718

6400

5720

3270

1919

q13.

411

9.l

2ah

ZN

F6J

4Z

inc

fing

erpr

otei

n61

4K

RA

B11

-58

557

2083

9157

2234

29

1919

q13.

411

9.l

2ai

ZN

F43

2Z

inc

fing

erpr

otei

n43

2K

RA

B17

-65

257

2284

9057

2438

85

1919

q13.

411

9.l

2aj

L0C

2843

71H

ypot

heti

cal

prot

ein

L0C

2843

71-

--

924

5725

9531

5729

0830

1919

q13.

411

9.l

2ak

ZN

F61

6Z

inc

fing

erpr

otei

n61

6K

RA

B21

-78

157

3089

6757

3350

03

1919

q13.

4119

.12a

1F

LJ1

6287

Sim

ilar

tozi

ncf

ing

erp

rote

in61

6K

RA

B25

-93

657

3499

3757

3665

56

1919

q13.

4119

.12a

mZ

NF

766

Zin

cfi

nger

prot

ein

766

KR

AB

10+

468

5746

4636

5748

7766

1919

q13.

411

9.l

2an

ZN

F48O

Zin

cfi

nger

prot

ein

480

KR

AB

12+

516

5749

2263

5752

0987

1919

q13.

411

9.l

2ao

ZN

F61O

Zin

cfi

nger

prot

ein

610

KR

AB

9+

462

5754

0494

5756

1923

1919q13

19

.l2

apZ

NF

528

Zin

cfi

nger

prot

ein

528

KR

AB

15+

628

5759

2933

5761

3469

1919

q13.

411

9.l

2aq

ZN

F53

4Z

inc

fing

erpr

otei

n53

4K

RA

B17

+67

457

6242

5257

6345

11

1919

q13.

4119.l

2ar

ZN

F57

8Z

inc

fing

erpr

otei

n57

8-

12+

365

5770

6025

5770

8794

1919

q13.

4119.l

2as

ZN

F8O

8S

imil

arto

zinc

fing

erpr

otei

n60

0-

2+

903

5773

8331

5775

0693

1919

q13.

4119.l

2at

ZN

F7O

1Z

inc

fing

erpr

otei

n70

1K

RA

B9

+46

557

7653

4057

7798

61

191

9q

13

.41

9.l

2au

ZN

F13

7Z

inc

fing

erpr

otei

n13

7-

5+

207

5779

1719

5779

5214

191

9q

13

.31

9.l

2av

ZN

F83

Zin

cfi

nger

prot

ein

83(H

PF

1)-

15-

516

5780

7443

5783

3450

1919

q13.

411

9.l

2aw

L0C

7298

40S

imil

arto

zin

cfin

ger

pro

tein

160

KR

AB

14-

626

5784

7611

5788

5574

19q1

3.41

19

.l2

axZ

NF

611

Zin

cfin

ger

pro

tein

611

KR

AB

17-

705

5789

9284

5792

4947

Tad

epal

lyet

.aI

Page 125: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

I

je§ I jr

ItttitIIIttItIItttttIIIttitIIiiIitittIIIIIIIIIIIHIHIHIIIJ IIHIH 1:1111

ai in ii i n nr r r r r r r r r r r r r r r r r r r r r r r r r r reeeeeeeeeeeeoœeeeeeooeemeeer r r r r r r r r r r r r r r r r r r r r r r r r r r

JflfflflflflOEflOEOEOEfl

rqrr rC4W

mmIIIIIIIIIIdNIVhHIIIUflorrr

000000eeee

R;8; 0

HHIIVIIjjIflhIIiiiIjiIII!IIiHIiiflflfl eeeeeeeeee

r e reece n ra r r

I I I I I I I I I I + I ++++++++ I I + I ++ I + I + I ++ I + I

oerNøe OrOO OParStCfl r 0t4eri r r r r r q LN r r N r N CI r r r r r e r q e n q v q e e e n e r r r

r r r r rdddddr r r r r

oeeeoeeooeo00ee000eeee0000000ee00000r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r

Page 126: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

D

191

9q

13

.43

19.l

3as

191

9q

13

.43

19

.l3

at

191

9q

13

.43

19

.l3

au

191

9q

13

.41

9.l

3av

191

9q

13

.43

19.l

3aw

191

9q

13

.41

9.l

3ax

1919

q13.

41

9.l

3ay

1919

q13.

41

9.l

3az

191

9q

13

.41

9.l

3b

a

191

9q

13

.43

19.l

3bb

1919

q13.

41

9.l

3b

c

191

9q

13

.43

19.l

3bd

1919

q13.

41

9.l

3b

e

191

9q

13

43

19.l

3bf

191

9q

13

.43

19.l

3bg

1919

q13.

419.l

3bh

1919

q13.

419.l

3bi

191

9q

13

.43

19.l

3bj

191

9q

13

.43

19.l

3bk

191

9q

13

.43

19.1

3b1

191

9q

13

.43

19.l

3bm

191

9q

13

.43

19.l

3bn

1919

q13.

419.l

3bo

191

9q

13

.43

19.l

3bp

191

9q

13

.43

19.l

3bq

1919

q13.

41

9.l

3b

r

1919

q13.

41

9.l

3b

s

191

9q

13

.43

19.l

3bt

1919

q13.

4319.l

3bu

1919

q13.

419.l

3bv

1919

q13.

4319.l

3bw

191

9q

13

43

19

l3b

x

1919

q13.

4319.l

3by

1919

q13.

431

9.l

3b

z

191

9q

13

43

19l3

ca

1919

q13.

4319l3

cb

3f

19

q1

3.4

31913cc

ZN

F471

zinc

fing

erpr

otei

n47

1K

RA

B15

+62

6

ZF

P28

Zin

cfi

nger

prot

ein

28m

ouse

hom

olog

KR

AB

-KR

AB

15÷

868

ZN

F47O

Zin

cfi

nger

prot

ein

470

KR

AB

17+

717

ZN

F71

zinc

fing

erpr

otei

n71

(Cos

26)

-13

+48

9

BC

3729

5_3

Hyp

othe

tica

lpr

otei

nB

C37

295_

3-

14-

559

ZIM

2Z

inc

fing

er,

impr

inte

d2

KR

AB

5-

527

PE

G3

Pat

erna

lly

exp

ress

ed3

SCA

N12

-15

88

ZIM

3Z

inc

fing

er,

impr

inte

d3

KR

A8

11-

472

ZN

F26

4zi

ncfi

nger

prot

ein

264

KR

AB

13+

627

ZN

F8O

5Z

inc

fing

erpr

otei

n80

5K

RA

B11

+37

7

ZN

F27

2Z

inc

fing

erpr

otei

n27

2K

RA

B11

+56

2

ZN

F54

3Z

inc

fing

erpr

otei

n54

3K

RA

B13

+60

0

ZN

F3O

4Z

incf

ing

erp

rote

in30

4K

RA

B15

+65

9

ZN

F57

4Z

inc

fing

erpr

otei

n54

7K

RA

B9

÷40

2

ZN

F54

8Z

inc

fing

erpr

otei

n54

8K

RA

B11

÷53

3

ZN

F17

5Z

inc

fing

erpr

otei

n17

(KO

X10

)K

RA

B17

+66

2

ZN

F74

9Z

inc

fing

erpr

otei

n74

9-

17+

691

ZN

F77

2Z

inc

fing

erpr

otei

n77

2K

RA

B10

-48

9

ZN

F41

9Z

inc

fing

erpr

otei

n41

9K

RA

B11

+51

0

ZN

F77

3Z

inc

fing

erpr

otei

n77

3K

RA

B9

+44

2

ZN

F54

9Z

inc

fing

erpr

otei

n54

9K

RA

B15

+62

7

ZN

F55O

Zin

cfi

nger

prot

ein

550

KR

AB

8-

381

ZN

F41

6Z

inc

fing

erpr

otei

n41

6K

RA

B12

-59

4

ZIK

1Z

inc

fing

erpr

otei

nZI

K1

KR

AB

9+

487

ZN

F53O

Zin

cfi

nger

prot

ein

530

KR

AB

13+

599

ZN

F13

4Z

inc

fing

erpr

otei

n13

4-

10+

427

ZN

F21

1Z

inc

fing

erpr

otei

n21

1K

RA

B12

+56

4

ZS

CA

N4

Zin

cfi

nger

and

SCA

Ndo

mai

n4

SCA

N4

+43

3

ZN

F551

Zin

cfi

nger

prot

ein

551

KR

AB

16+

654

ZN

F15

4Z

inc

fing

erpr

otei

n15

4K

RA

B10

-44

9

ZN

F671

Zin

cfi

nger

prot

ein

671

KR

AB

10-

534

ZN

F77

6Z

inc

fing

erpr

otei

n77

6-

10+

476

ZN

F58

6Z

inc

fing

erpr

otei

n58

6K

RA

B10

+40

2

ZN

F55

2Z

inc

fing

erpr

otei

n55

2K

RA

B8

-40

7

ZN

F58

7Z

inc

fing

erpr

otei

n58

7K

RA

B13

+57

5

ZN

F81

4Z

inc

fing

erpr

otei

n81

4-

--

-

ZN

F41

7Z

inc

fing

erpr

otei

n41

7K

RA

B13

-57

5

6171

1024

6174

2129

6176

8362

6179

8504

6186

6765

6197

7742

6201

5615

6233

7276

6239

4681

6245

6615

6248

3745

6252

3689

6255

4487

6256

6691

6259

3030

6261

4359

6264

6590

6267

2766

6269

0945

6270

3121

6273

0505

6275

9537

6278

2055

6278

7440

6280

3065

6281

7440

6283

6396

6287

2115

6288

5217

6290

4779

6292

2931

6295

4023

6297

2850

6301

0264

6305

3081

6307

5442

6311

0026

6173

2082

61

75

99

82

6178

1931

6182

7362

6187

6058

6204

3887

6204

3876

6234

8382

6242

2351

6246

5479

6249

6618

6253

3956

6256

3078

6258

2739

6260

4598

6262

4983

6264

8665

6268

0750

6269

7860

6271

1338

6274

3943

6275

0155

6277

4746

6279

5570

6281

1444

6282

6536

6284

5946

6288

2317

6289

2991

6291

2391

6293

0795

6296

1337

6298

3757

6301

8093

6307

1765

6309

2226

6311

9756

qi

Tad

epal

lyet

.aI

Page 127: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

191

9q

13

.43

191

9q

13

.43

191

9q

13

.4

191

9q

13

.43

191

9q

13

.4

191

9q

13

.43

191

9q

13

.43

19l9

qte

r

191

9q

13

.43

191

9q

13

.43

191

9q

13

.43

191

9q

13

.43

191

9q

13

.43

191

9q

13

.43

191

9q

13

.4

191

9q

13

.43

191

9q

13

.43

191

9q

13

.43

191

9q

13

.43

191

9q

13

.2-q

13

.4

191

9q

13

.2

2020p13

2020p13

202

op

ter-

ql

1.23

2020p11.2

3-p

ll.2

2

202O

p12.3

-pll

.21

202O

p11.

21

202

0q

11

.21

2020q11.2

2

2020q11.1

-qll

.23

2020q11.2

1-q

13.1

2

2020q13.1

2

2020q13.1

2

2020q13.1

-q13.2

2020q13.1

3-q

13.2

ZJ

20q13.2

19

.l3

cd

19.l

3ce

19

.l3

cf

19

.l3

cg

19

.l3

ch

191.l

3ci

19.l

3cj

19.l

3ck

19.1

3d

19.1

3cm

19

.l3

cn

19

.l3

co

19

.l3

cp

19

.l3

cq

19.1

3cr

19.l

3cs

19

.l3

ct

19

.l3

cu

19

13

cv

19.1

3cw

19

.l3

cx

20

.la

20.l

b

20.2

a

20.2

b

20.2

c

20.3

a

20.3

b

ZN

F41

8Z

inc

fing

erpr

otei

n41

8K

RA

B16

-67

6

ZN

F25

6Z

inc

fing

erpr

otei

n25

6K

RA

B15

-62

7

ZN

F6O

6Z

inc

fing

erpr

otei

n60

6K

RA

B16

-79

2

ZSC

AN

1Z

inc

fing

eran

dSC

AN

dom

ain

cont

aini

ng1

SCA

N3

+40

8

ZN

F13

5Z

incf

inger

pro

tein

135

KR

AB

16+

658

ZN

F44

7Z

inc

fing

erpr

otei

n44

7SC

AN

2-

510

ZN

F32

9Z

inc

fing

erpr

otei

n32

9-

12-

541

ZN

F27

4Z

inc

fing

erpr

otei

n27

4K

SK5

+65

3

ZN

F54

4Z

inc

fing

erpr

otei

n54

4K

RA

B13

+71

5

ZN

F8Z

inc

fing

erpr

otei

n8

KR

AB

7+

575

HK

R2

GL

I-K

rupp

elfa

mil

ym

ember

HK

R2

SCA

N8

÷49

1

ZN

F49

7Z

inc

fing

erpr

otei

n49

7-

14-

498

L0C

1164

12H

ypot

heti

cal

prot

ein

BC

0123

65-

8-

531

ZN

F58

4Z

inc

fing

erpr

otei

n58

4K

RA

B8

+42

1

ZN

FJ3

2Z

inc

fing

erpr

otei

n13

2K

RA

B18

-70

6

ZN

F32

4BZ

inc

fing

erpr

otei

n3248

KR

AB

9+

544

ZN

F32

4Z

inc

fing

erpr

otei

n32

4K

RA

B9

+55

3

ZN

F44

6Z

inc

fing

erpr

otei

n44

6SC

AN

3+

450

ZN

F49

9Z

inc

fing

erpr

otei

n49

9B

TB4

-51

1

ZN

F42

Zin

cfi

nger

prot

ein

42SC

AN

13-

734

ZN

F93

Zin

cfin

ger

pro

tein

93K

RA

B17

SC

RT

2S

ratc

hho

mol

og2,

zinc

fing

erpr

otei

n-

5-

307

ZN

F34

3Z

inc

fing

erpr

otei

n34

3K

RA

B12

-59

9

ZN

F33

9Z

inc

fing

erpr

otei

n33

9-

4-

275

ZN

F13

3Z

inc

fing

erpr

otei

n13

3K

RA

B15

+65

3

ZN

F33

6Z

incf

ing

erp

rote

in33

6B

TB10

+71

1

ZN

F33

7Z

inc

fing

erpr

otei

n33

7K

RA

B20

-75

1

PLA

GL

2Z

inc

fing

erpr

otei

nPL

AG

L2

-6

-49

6

ZN

F341

Zin

cfi

nger

prot

ein

341

-12

+84

7

SCA

ND

1SC

AN

dom

ain

cont

aini

ngpr

otei

n1

SCA

N-

-17

9

ZN

F33

5Z

inc

fing

erpr

otei

n33

5-

13-

1342

ZN

F66

3Z

inc

fing

erpr

otei

n66

3-

1-

106

ZN

F33

4Z

inc

fing

erpr

otei

n33

4K

RA

B14

-68

0

SNA

I1Z

inc

fing

erpr

otei

nsn

ail

hom

olog

-4

+26

4

SAL

L4

SaI-

like

4-

7-

1053

ZF

P64

Zin

cfi

nger

prot

ein

64ho

mol

og-

13-

645

6312

5064

6314

4013

6318

0252

6323

7246

6326

2424

6328

7018

6332

9507

6338

6208

6343

2646

6348

2130

6353

0197

6355

7537

6357

0549

6361

1875

6363

5994

6365

4783

6367

0275

6367

9607

6371

6709

6376

5096

5902

40

2410

463

1795

2796

1821

7157

2329

3021

2560

2851

3024

3968

3178

3469

3400

4960

4401

0699

4447

6274

4456

3114

4803

2934

4983

3988

5013

3957

6313

8552

6315

0889

6320

6526

6325

7811

6327

2588

6330

1389

6335

3960

6341

6739

6346

7285

6349

9066

6354

5510

6356

5932

6358

4224

6362

150

6

6364

3401

6366

1011

6367

6577

6368

4409

6372

2733

6377

6754

6048

23

2437

778

1798

6521

1824

5640

2330

1683

2562

5469

3025

9192

3184

3736

3400

5842

44

03

42

40

44

52

13

22

4457

5601

48

03

88

30

4985

2421

5024

1931

Tad

epal

lyet

.aI

Page 128: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

202

0q

13

.2

2020

q13.

31

202

0q

13

.33

2121

q21.

1

212

1q

22

.3

212

1q

22

.3

2222

p

2222

q11.

1

2222

q11.

1

2222

q11.

21

2222

q11.

21

2222ql1

.2

2222q11.2

2

2222q11.2

3

222

2q

12

.2

222

2q

11

.2

XX

p21.

3

XX

pll

.3

XX

pll

.2

XX

pll

.23

XX

pll

.23

XX

p22.1

1-p

ll.2

3

XX

pll

.1-1

1.3

XX

pll

.23

XX

pll

.21

XX

pll

.21

XX

pll

.1

XX

q13.

2

XX

q21.

1-q2

1.2

XX

q23

XX

q26.

3

XX

q26.

3

XX

q26.

2

Xq2

8

ZN

F21

7

CT

CFL

BT

BD

4

ZN

F29

9P

PR

DM

15

ZN

F29

5

ZN

F73

4)L

0C34

3927

4)L

0C39

1288

ZN

F74

HIC

2

SU

HW

2

SUH

W1

ZN

F7O

ZN

F27

8

ZN

F69

ZFX

ZN

F67

4

ZN

F15

7

ZN

F41

ZN

F81

ZN

F21

ZN

F63O

4)L

0C13

9163

KL

F8

ZX

DB

ZX

DA

9)L

0C26

0337

ZN

F6

ZB

TB

33

ZN

F75

ZN

F44

9

ZIC

3

ZN

F27

5

Zin

cfi

nger

prot

ein

217

Zin

cfi

nger

prot

ein

CT

CF

-T

BTB

(PO

Z)

dom

ain

cont

aini

ng4

zinc

fing

erpr

otei

n29

9pse

udogen

e

Zin

cfi

nger

prot

ein

298

Zin

cfi

nger

prot

ein

295

(ZB

TB

21)

Zin

cfi

nger

prot

ein

73

sim

ilar

tozi

ncfi

nget

prot

ein

91

Sim

ilar

tozi

ncfi

nger

prot

ein

532

Zin

cfi

nger

prot

ein

74

ZB

TB

3O

Zin

cfi

nger

prot

ein

279

Zin

cfi

nger

prot

ein

280

Zin

cfi

nger

prot

ein

70

Zin

cfi

nger

prot

ein

278

Zin

cfi

nger

prot

ein

69

Zin

cfi

nger

ptot

ein

X-l

inke

d

Zin

cfi

nger

prot

ein

673

Zin

cfi

nger

prot

ein

157

Zin

cfi

nger

prot

ein

41

Zin

cfi

nger

prot

ein

81

Zin

cfi

nger

prot

ein

21

Zin

cfi

nger

prot

ein

630

Sim

ilar

toS

al-l

ike

prot

ein

1

Kru

ppel

-lik

efa

ctor

8

Zin

cfi

nger

,X

-lin

ked,

dupl

icat

edB

Zin

cfi

nger

,X

-Iin

ked,

dupl

icat

edA

Zin

cfi

nger

prot

ein

Np9

7pse

udogen

e

Zin

cfi

nger

prot

ein

6

Zin

cfi

nger

and

BT

Bdo

mai

nco

ntai

ning

33

Zin

cfi

nger

prot

ein

75

Zin

cfi

nger

prot

ein

449

Zin

cfi

nger

prot

ein

ofth

ece

rebe

llum

3

Zin

cfi

nger

ptot

ein

275

-7

-10

48

-11

-66

3

BT

B2

-58

9

8+

326

-+

-

12+

572

5+

615

1-

543

1-

542

11-

446

7-

687

-+

323

13+

805

11-

581

12+

506

18-

779

13+

661

15-

639

13-

657

3+

359

9+

803

9-

799

-+

-

11+

761

3+

672

5-

510

7+

518

467

11+

376

D

13-

1507

6-

1066

21

.la

21.l

b

22

.la

22.l

b

X.l

a

X.l

b

X.1

c

X.1

d

X.l

e

X.2

a

X.2

b

X.3

a

X.3

b

X.4

a

SE

T

BT

B

KR

AB

BTB

BT

B

KR

AB

KR

AB

KR

AB

KR

AB

KR

AB

KR

AB

KR

AB

BT

B

SCA

N-K

RA

B

SCA

N

D

Tad

epal

lyet

.aI

5161

7017

5550

5630

6184

6322

2338

3998

4209

1454

4228

0009

4500

78

1500

7571

1571

3058

1907

8478

2010

1693

2116

8767

2119

8060

2241

3772

3005

1790

3075

1319

2407

7824

4624

3490

4711

4926

4719

134

7

4758

1245

4771

9194

4780

2547

4931

6398

5627

5632

5763

4994

5795

0922

7324

2135

8438

5694

1192

6863

5

1342

4738

5

1343

0638

7

1364

7601

2

1522

6206

0

5163

3043

5553

3560

6190

7300

2338

5465

4217

2660

4230

3519

4525

98

1505

4750

1576

1696

1909

1970

2013

5748

2119

3505

2120

4613

2242

3279

3007

2249

3076

5974

2414

2549

4628

9820

4715

8338

4722

7289

4766

6550

4774

8321

4781

5739

4932

5222

5632

8255

5764

0635

5795

3792

7324

5219

8441

5024

1192

7627

9

1343

0562

3

1343

2500

4

1364

8192

5

1522

7024

9

Page 129: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

XX

q28

X.4

bL

0C13

9735

Sim

ilar

tazi

ncfi

nger

ptot

ein

92K

RA

B8

+49

515

2336

765

1523

4028

0

YY

pll

.3-

ZFY

zinc

fing

erpr

otei

n,Y

-Iin

ked

-12

+80

128

6354

629

0989

1

YY

ql1.

223

-Y

ZN

F381

Pzi

ncfi

nger

prot

ein

381,

Y-I

inke

dpse

udogen

e-

-+

-23

6055

7023

6067

03

YY

ql1.

223

Y.l

aY

L0C

3926

03si

mil

arta

Zin

cfi

nger

prot

ein

43(Z

inc

prot

ein

HT

F6)

--

+-

2523

5272

2524

0390

YY

ql1.

23Y

.1b

YL

0C44

2486

sim

ilar

tazi

ncfi

nger

prot

ein

91-

2554

0738

25

54

59

02

For

each

C2H

2-Z

NF

inth

edat

aset

Chr

omos

ome

num

ber

2P

osit

ion

onth

ech

rom

osom

e

The

clu

ster

num

ber

taw

hich

the

gene

belo

ngs

‘-‘f

ora

gene

foun

das

asi

ngle

ton

rath

erth

ana

clus

ter.

Ifth

egen

ebe

long

sto

acl

uste

r,th

ecl

uste

rnu

mbe

ris

indi

cate

d:T

hefi

rstn

umbe

rin

dica

tes

the

chro

mos

ome

num

ber,

The

seco

ndnu

mbe

rin

dica

tes

the

num

ber

0fth

ecl

uste

ron

the

chro

mos

ome.

For

exam

ple,

acl

uste

rnu

mbe

r7.

7in

dica

tes

Chr

omos

ome

1.C

lust

erl

Sta

tus

asa

pseu

doge

neas

rep

od

edin

Gen

bank

.‘Y

’-

lden

tifi

edas

apse

udogen

e

The

nam

ecf

the

C2H

2-Z

NF

and

itsde

scri

ptio

n

The

dom

ain

asso

ciat

edw

ithth

eC

2H2-

ZN

F:

KR

AB

,SC

AN

,SC

AN

-KR

AB

,B

TB,

HO

ME

O,

SE

Tan

with

out

anen

coded

cons

erve

dN

-ter

min

aldo

mai

n()

8T

henum

ber

of

zinc

fin

ger

mot

ifs

pre

sen

t

The

ori

enta

tion

°T

heam

ino

acid

sequen

ceIe

ng

th7172

The

star

tan

dst

op

of

tran

slat

ion

111

Tad

epal

lyet

.aI

Page 130: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Supplementary Table S2

Comprehensive summary of the organization of ail C2H2-ZNF tound as singletons or in clusters

on each human chromosome and classified with respect to the various C2H2-ZNF sub-families.

C2H2-ZNF sub-families

Total No. C2H2-ZNF in KRAB SCAN-KRAB SCAN BTB SET HOMEC oChr No. C2H2-ZNP No. Clusters Ciusters S C S C S C S C S C S C S C

1 36 6 17 1 9 1 1 7 1 9 7

2 17 3 7 1 2 9 5

3 30 6 20 2 5 3 2 1 6 11

4 10 1 4 1 2 1 4 2

5 15 1 6 1 5 8 1

6 28 3 16 1 2 2 732 8 3

7 47 7 41 127 2 512

8 30 4 16 36 1 1 2 10 8

9 23 4 10 5 3 1 3 2 4 5

10 22 3 13 8 9 5

11 15 2 4 3 1 1 1 3 4 2

12 15 1 7 2 6 : 6 1

13 5 2 4 1 4

14 10 2 4 1 2 :i 5 1

15 12 3 8 12 1 3 5

16 33 6 27 10 1 2 3 5 12

17 13 2 5 1 1 1 1 11 5 2

18 14 2 6 1 32 5 3

19 289 13 279 3 185 1 13 4 1 6 76

20 18 3 7 2 2 1 2 6 5

21 3 1 2 1 1 1

22 10 1 2 2 2 4 2

X 19 4 11 1 6 1 11 6 3

Y 4 1 2 : 2 2

Total 718 81 578 32 280 4 14 3 30 28 13 1 1 2 3 130 177

No. of C2H2-ZNF without an encoded conserved N-terminal domain (ø), found as singletons (S)

or in C2H2-ZNF clusters (C)

Tadepally et.aI 172

Page 131: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Supplementary Table S3

Gene organization of the 81 human C2H2-ZNF clusters.

Cluster1

No. 0f Clusters with Order of the genes from the

Position C2H2-ZNF Solely C2H2-ZNF different C2H2-ZNF subfamilies2 Cluster composition4

1.1

1.2

1.3

1.4

1.5

1.6

2.1

2.2

2.3

3.1

3.2

3.3

3.4

3.5

3.6

4.1

5.1

6.1

6.2

6.3

7.1

7.2

7.3

7.4

7.5

7.6

7.7

8.1

8.2

8.3

8.4

9.1

9.2

9.3

9.4

10.1

10.2

10.3

11,1

11.2

12.1

13.1

13.2

14.1 j14.2

15.1

15.2

15.3

16.1

1p36.l1

1p34.2

1p34.1

1q42.13

1 q44

1 q44

2q 11.1

2q1 3

3p22.1

3p2l .32

3q1 3.2

3q24

3q26.32

4pl 6.3

5q35.3

6p22.1

6p22.1

6p2l.3

7p22.1

7p1 1.2

7q1 1.21

7q22.1

7q22.1

7q36.1

8p23.l

8q24.1 3

8q24.3

8q24.3

9q22.32

9q31 .2

9q32

9q33,2

lOpll.21

lOqi 1.21

lOqll.21

llpl5.4

11q12.2

12q24.33

13q22.1

13q32.3

14q11.2

1 4q23.3

15q24

15q25.3

1 5q26.1

l6pl3.3

2

3

2

2

6

2

2

3

2

3

3

8

2

2

2

4

6

2

12

2

2

4

5

9

8

3

10

2

2

5

7

4

2

2

2

6

3

4

2

2

7

2

2

2

2

2

3

3

9

No

No

No

No

No

Yes

No

Yes

No

Yes

No

Yes

No

YesNo

Yes

No

No

No

No

No

No

No

No

No

YesNo

No

No

No

No

No

No

Yes

Yes

Yes

No

Yes

Yes

Yes

Yes

No

YesYes

YesNo

No

No

No

0/0

KIKIK

0/0

o/K

KJKJK/KIK/S-K

0/0

0/0

K’K/ø

0/0

o/KIK

0/1<1K

S-KIKIS-KJø/S-KIo/ø/ø

ø/B

0/0

0/0

KIø/ø/K

K/K/ø/KJK/K

ø/K

S/S/S-K/ø/S/S-K/o/S/S/S/S/K

B/B

o/K

ø/K/ø/K

o/o/KIø/K

KJK/K/ø/ø/K/K/ø/K

K/S-K/S-KIø/ø/KIK/K

K/K/K

/K/K/K/K/ø/K-K

0/0

H/H

ø/ø/o/ø/K

ø/KIKJKIKIo/K

ø/KJKJø

0/0

o/K

BIB

KIø/ø/KJK/K

o/KIK

K/ø/K/ø

S-K/K

0/0

KJKIKJKJKJKIø

0/0

0/0

H/o

BIB

B/o

S/S/ø

0/0/0

S/KIS-K/ø/S-KIKIø/S/o

Pure

Pure

Pure

Mixed

Mixed

Pure

Pure

Mixed

Pure

Mixed

Mixed

Mixed

Mixed

Pure

Pure

Mixed

Mixed

Mixed

Mxed

Pure

Mixed

Mxed

Mixed

Mixed

Mixed

Pure

Mixed

Pure

Pure

Mixed

Mixed

Mixed

Pure

Mixed

Pure

Mixed

Mixed

Mixed

Mixed

Pure

Mixed

Pure

Pure

Mixed

Pure

Mixed

Mixed

Pure

Mixed

Tadepally et.aI 113

Page 132: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

16.2 16pI3.3 2 No øIS Mixed

16.3 16p11.2 10 No øIø/5K/ø/ø/ø Mixed

16.4 l6pll.2 2 No øIK Mixed

16.5 16q22 2 Yes KIK Pure

166 16q24 2 2 Yes oie Pure

17.1 17p13.2 3 No ø/S/ø Mixed

17.2 l7pll.2 2 No S-K/K Mixed

18.1 18q12 4 Yes S/o/S/S Mixed

18.2 18q23 2 No øiø Pure

19.1 19q13.2 3 Yes K/KJK Pure

19.1 l9p13.3 2 No ø/B Mixed

19.11 19q13.31 24 Yes 19KI5ø Mxed

19.12 19q13.41 43 No ‘ 30KI13ø Mixed

19.13 19q13.43 76 No 343K/1S-K/12S/1B/19ø Mixed

19.2 19p13.3 5 Yes K/K/K/K/K Pute

19,3 19p13.3 2 No ø/B Mixed

19.4 19p13.2 2 No Kb Mixed

19.5 lgpl3.2 14 No 9K/5ø Mixed

19.6 19pl3.2 28 No 21K/7ø Mixed

19.7 19pl3.ll 40 No 328K/12ø Mixed

19.8 19q13.11 8 No 3K/S/3ø Mixed

19.9 19q13.12 32 No 23K/1B/8ø Mixed

20.1 20p11.23 2 No K/ø Mixed

20.2 20q13.12 3 No obøbK Mixed

20.3 20q13.2 2 Yes øÏø Pure

21.1 21q22.3 2 No Se/B Mixed

22.1 22q11.22 2 No 0/0 Pure

X.1 Xpl 1.23 5 No K/K/K/K/K Pure

X.2 Xpll.1 2 No 0/0 Pure

X.3 Xq26.3 2 Yes S-K/S Mixed

X.4 Xq28 2 No oiK Mixed

Y.1 Yql 1.23 2 No 0/0 Pure

Total 81 518 Yes= 25; No = 56 Pure= 29; Mixed= 52

Clusters C2H2-ZNF

For the cluster name, the first number correspond to the chromosome on which the cluster is found

and the second to the number attributed to the cluster.

2 Sequential order 0f the genes from the different C2H2-ZNF subfamilles such as KRAB (K), SCAN (S),

SCAN-KRAB (S-K), SET (Se), HOMEO (H) and without an encoded conserved N-terminal domain (o)

For the very large clusters, the number of C2H2-ZNF from each subfamilles is specified

(eg: 23 K means that 23 consecutive genes from the KRAB-C2H2-ZNF subfamily are found in the cluster).

‘ Pure’ = The cluster is composed 0f C2H2-ZNF from a single subfamily;

‘Mixed’ = different subfamilles 0f C2H2-ZNF are present in the cluster.

Note: ‘Pure’ clusters with solely tandemly repeated C2H2-ZNF are in grey

Tadepally et.aI 114

Page 133: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Com

pre

hen

sive

cata

log

ofth

eC

2H2-

ZN

Fg

enes

from

the

81h

um

ancl

ust

ers

and

thei

rsy

nte

nic

counte

rpar

tsfr

omoth

erm

amm

alia

n

gen

om

es(C

him

pan

zee

Mou

se,

Rat

and

Dog

)

Hum

anC

iust

er7.

2

hch

rl

F0

ZM

PST

E24

CO

LSA

2

L0

C3

88

62

lSM

AP

IL

ZN

F64

3Z

NF

642

ZN

F68

4

F0

L0C

7244

33.

RIM

S3

NFY

CF

0004

DF

0

K9

+

K9+

K8

+

Ch

imp

anze

e

pC

hrl

F0ZM

PSTE

24,

L0C

3567

9lD

FO

L0C

7466

91,

SM

AP

IL

ZN

F64

3Z

NF

642

L0C

4567

97

F0

t0C

7687

4&R

IMS3

t0C

4557

93K

CN

Q4

-9

*

K9.

KB

*

Mouse

mC

hr4

FGZ

mp4

e24,

CoI

9a2

DF

C

Sm

apll

Zfp

69K

9

F0

Ri,

,e3,

Nfy

c

Kcn

a4

Rat

rchr5

F0Z

mps

O24

Co!

942

DF

O

Sm

4plI

F0

R0

3.N

fyc

Kcr

,4

Oog

F0tO

C6O

74S6

tOC

-175

312

tOC

’524

51

JD

Su

pp

lem

enta

ryT

able

4

D

Hum

anC

iust

er7.

7C

him

pan

zee

Mou

seR

atD

og

hch

r7pohrl

mch

r4rc

hr5

cchr2

DF

OD

FO

DF

0D

F0

OF

O

F0

TRIM

G3,

FOlK

ILF

0TR

IM63

,P

0IK

ILF

0Tr

i,r,6

3,P

d,kI

lF

0T

,ss,

63,

Pd,

kIl

F0

TR

IM63

,PD

IKIL

0R

AP

I0

RA

PI

Gra

piG

rapl

0RA

PI

ZN

F59

3-

1*

ZN

F59

3-

1Z

fp59

3.

t-

Ztp

593pre

d.

1.

L0C

4873

60-

1*

ZN

68

3-

4Z

NF

683

-4

L0C

4873

54-

4-

F0

LlN

28.

014003

F0

LIN

2B01

4003

F0-

Lif

l28.

Dhd

dsF

0L

,a28

.D

hdds

FGL

lN28

.D

HO

O3

HM

ON

2H

MG

N2

H,r

ç,,2

Hrg

r2H

MG

N2

rchr7

G.

t0C

6O76

22,

DF

O

OC

GO

Z00

9CO

C3S

_’45

8

L.0

C48

2457

K17

-

L0C

4824

54K

S-

Hum

anC

lust

er7.

3C

him

pan

zee

Mo

use

Rat

Dog

hchrl

pch

rfm

chr4

mch

r5C

chrl

5

F0

AT

PSV

OS.

BA

0AL

TD

FO

F0

AT

FSV

OB

,B

A0A

LT

DF

OF

0A

1p

6o

bB

agal

0D

FO

FGA

tp6r

obB

agal

t2D

FO

F0

AFP

,500

B.

BAG

ALT

DF

O

2,C

C02

4,S

LC

6A9

2.C

C02

4SC

C6A

90

16

9$1c5

92

CC

D24

SLC

F4S

KL

F17

.3

*K

LF1

7-

3*

K1f

17.

3-

Z1p

393_

pred

.3

-K

LF1

7-

3*

L0

C1

28

20

8-

5-

F0

OM

AP

I,P

RN

PIP

F0

OM

API

,P

RN

PIP

F0

Dnep

l,P

rspi

pF

00,m

pl,

Frr

pip

F0

OM

API

,P

NF

lP

TM

EM

53T

ME

M53

Tmem

S3Tm

5705

3T

ME

M53

115

Tad

epal

lyet

Page 134: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Hum

anC

lust

er1.

4C

him

pan

zee

Mo

use

Rat

Dog

hchrl

pch

rlm

ch

rll

rchrl

Occhrl

4

F0

CD

C42

BP

AD

FO

F0

CD

C42

BP

AD

FO

F0

Cdc4

2bpi

DF

OF

0C

dc42

bpa

DF

OFG

CD

C42

SP

AD

FO

CT

O-2

90?

CT

D-2

90?

Cld

-29d

,C

td-2

9d1

0W

-290?

ZN

F67

815

*L

0C46

9695

-15

*

gm

127

K3

FGJM

J04

,M

PN2

F0

]MJD

4,M

PN2

F0

Jmjd

4K

OJm

jd4

F0

JMJD

4,M

PN2

WN

T9A

WN

T9A

Wnt

9aW

*t9a

WN

T9A

Hum

anC

iust

er1.

5

hchrl

DF

0KG

TFS

2MSC

CPD

H

L0C

149134A

HC

TF

I

ZN

F69

5K

-

ZN

F67

OK

9

ZN

F66

9K

9

ZN

F12

4K

7

L0C

729806

K10

ZN

F49

65K

5

F0

C?A

SI0

R2811

002W

5

Ch

imp

anze

e

pchrl

KGT

FOZ

MSC

CPD

H

tOC

l49l

34,A

HC

TF

I

ZN

F67O

K9

-

L0C

4578

77K

9-

ZN

F12

4K

7-

L0C

4578

80K

10-

ZN

F49

6s

s-

F0

C?A

SI,

OR

2BII

0R

2W

5

DF

O

Mouse

mch

r8

DF

CF0

.TG

2m,

S**Ç

4)

G,,,

l305

,A

ScII

I

BC

0500

78K

12-

Ztp

496

SK5

-

F0

C,a

sl.

0/5-

222

C0C

6II9

157

Iat

rch

rl0

FG

flb2

,n,

Scc*

**

On,

1305

,A

ScII

?

Zfp

496

F0

C,G

sl,

09,2

22

C0C

6681

57

DF

0

55-

000

cch

rll

DF

OG

C0C

-12O

I2

0C

48

010

5

L0C

4905

75s

5-

WC

?AS

I,0R

2011

7R2W

5

Hu

man

Clu

ster

1.6

hchrl

F0

0R

2T

22

7,

00

08

UI

DF

O

SH

3BP

SL

ZN

F67

2-

13

ZN

F69

2-

5

F0

P0

80

2

Ch

imp

anze

e

pcO

n

F0

0827227.

OR

5BU

ID

FO

SH

3BF

5L

ZN

F67

2-

13*

ZN

F69

2-

5-

FO

P0B

D2

Mo

use

mcnn?

F0

0r2

t227

DF

C

S53

bp5?

Zfp

672

-13

Zfp

692

-5

F0

Pgb

d2

Rat

tch

tlO

F0

0,21

227

DF

O

Sh3

bp5?

Zfp

672

-13

-

Zfp

692

-s

*

KG

P9b

d2

Oog

DC

fltl

t$

‘G.

CR

2722

70R

59U

1D

FO

OH

3BP5

L

L0C

4826

99-

12*

L0C

4826

98-

3-

F0

P12

802

Hu

man

Clu

ster

2.7

Chim

panzee

Mouse

Rat

00g

nchr2

pch

r2m

chr5

rch

r5cchnl7

F0

MP

VI7

.G

TF

3C2

DF

OF

0M

PV

I7.

0TF

3C2

DF

OF

GG

1f3c

2,E1

f294

DF

OF

0G

03c2

,E

it2b4

DF

OFG

MP

Vl7

0TF

3C2

DF

O

E(F

284.

SN

X17

E1F

284.

SN

XI7

Sra

xI7

5cc

17

E1F

284,

SN

XI7

ZN

F51

3-

7-

ZN

F51

3-

7-

Zfp

513

-7

-Z

tp51

3-

7-

L0C

4830

12-

7-

ZN

F51

2-

2*

ZN

F51

2-

2=

Zfp

512

-2

R0D

1561

52-

2-

L0C

6082

961

*

F0

CC

OC

I2I.

XA

BI

F0

CC

OC

I2I,

XA

BI

F0

,Xab

lSu

pS?

F0

Xab

lSupII

IF

0C

CD

CI2

IXA

B1

sup

Trt

SU

PT

7LS

UP

ITL

D

116

Tad

epal

lyet

Page 135: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

D

Hum

anC

lust

er2.

2C

him

pan

zee

Mouse

Rat

Dog

hch

r2p

chr2

mch

r2r

chr3

Cch

rl7

F0

.TE

KT

.,L

0C44

2048

DF

OF

0T

EF

l.4M

AC

DF

OF0

.T

kt4,

MA

IO

FO

F0.

TAkI

4,M

AI

DF

OF0

,C

0C48

3056

DF

O

MA

C,

MR

PS

5M

RP

S5

Mrp

s5M

rps5

MA

L,L

OC

475

746

ZN

FS

14K

7-

L0C

4704

31K

7-

Zfp

661

Kg

-Z

fp66

1K

9-

L0C

6116

70K

7*

ZN

F2

K9

+Z

NF

2K

9L

0C48

3055

K26

-

L0C

344065

10÷

L0C

4594

006

*

F0

PR

QM

2,K

CN

IP3

F0.

FR

OM

2.K

CN

IP3

F0

P,o

m2K

*nip

3F

0P

ru,n

2.K

cn3

F0.

tOC

S’76

54

FAI4

D2A

F45054

FaM

2a

Fah

d2

atQ

C50

5135

,LO

CA

2S74

5

Hum

anC

lust

er2.

3C

him

pan

zee

Mouse

Rat

Dog

hchri

pchri

F0

AD

RA

.2B

,A

STL

DF

OF

0A

DR

A2B

,A

STL

DF

OF

0A

DR

A2a

AST

LD

FO

F0

AD

RA

2B,

AST

LD

FO

F0

AD

RA

2B,

AST

LD

FO

OU

SP

SO

US

P2

DU

SP

2D

US

P2

DU

SP

2

L0C

343938

.-

-

L0C

442041

--

F0

NC

AP

HF

0N

CA

PH

F0

NC

AP

HF0

.’N

CA

PHFG

’NC

APH

LIN

CR

L1N

CR

LIN

CR

LIN

CR

LIN

CR

Hum

anC

iust

er3.

1C

him

pan

zee

Mo

use

Rat

Dog

hch

r3p

chr3

mch

r9r

chr8

cch

r23

F0

MF

RIP

,EIF

IBD

FO

F0.’

EIF

IBE

NT

PD

3D

FO

FG.M

FRIF

EIF

IBD

FO

FG

.MF

RIF

EIF

I6D

FO

F0-

MY

RIP

,EIF

IBD

FO

EN

TP

O3

RP

LI4

L0C

7357

59,L

0C73

5578

E*t

pd3,

RpI

$4E

Mpd

3,R

p114

EN

TPD

3,R

PC

I4

ZN

F61

9.

w+

L0C

4707

972K

26

ZN

F62O

K8

+L

0C47

0799

.-

-

ZN

F62

1K

7+

F0

tOC

S-35

607

CO

CS5

7625

F0

MP

PS

3IP

I,F

0C

0C64

5857

C0C

5576

25F0

.L

0C64

5507

,LO

CF5

625

F0

t0C

545807.

MR

PS

SIP

I.C

TN

N3I

CrN

N3I

MR

PS

3IP

I.C

TN

N3I

MR

PS37

P’,

CT

’’N

3IF7

RPS

3IPI

.C

TN

S3I

Hum

anC

lust

er3.

2C

him

pan

zee

Mo

use

Rat

Dog

hch

r3p

chr3

mch

r9r

chr8

Cch

r23

F0

VIS

AI.

SE

C22C

DF

OF

G.v

IPR

I,S

EC

22C

DF

OF

056

*22*

.Ss

I8I2

DF

O50

.0s

c22*

,55

1612

DF

OF

0V

IPA

1,S

EC

22C

DF

O

5518

L2.

NK

TR

5518

12,

NK

TR

Nkt

rN

ktr

5518

12,

NK

TR

ZN

F65

1.

8+

ZN

F66

2R

8+

Zfp

651

B8

*R

GD

156

2434

B8

*

ZN

F66

2K

8+

L0

C3

39

90

3K

-+

F0

SNA

K.

TM

EM

I6K

FG

.SIJ

RK

,FM

EM

IGK

F0

S,,r

k,T

,ss,

nl6

kF

0S

s,k.

Tm

ersl

6kF

057

*55,

TM

EM

I6K

46

50

54

55

05

4656

55A

bf,d

5A

6HD

S

11

7T

adep

alty

et.

Page 136: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

Hum

anC

iust

er3.

3C

him

pan

zee

Mouse

Rat

Dog

hch

r3p

chr3

mch

r9r

chr8

Cch

r23

F0

SNR

K,

TM

EM

I6K

DF

OF

001

4FF,

TM

EM

I6K

OF

OF0

.0,

1,5,

Tm

eml6

KD

FQ

F0

5,,,k

,T

mm

f6I(

DF

OF

0,

014F

F,T

ME

MIG

KD

FO

AB

HD

5,F

LJ3

615

7A

BH

D5,

FC

J361

57A

bhd5

,L

0C66

62la

Abh

d5,

L0C

6662t8

AB

HD

5

ZN

F44

50K

14-

ZN

F44

5e,

4-

Ztp

445

0K12

-R

GD

1559

144

0K12

-Z

NF

445

SK14

L0

C2

85

34

6K

12-

ZN

F16

723

*Z

tp16

7s

ii•

Znt

167

022

*

ZN

F16

70K

13+

ZN

F35

-11

*Z

fp66

0-

10*

Ztp

lO5

-11

ZN

F66O

-10

+Z

NF5

O2

.14

+Z

fplO

5-

11

ZN

F19

7SK

22+

L0C

4708

07-

9*

ZN

F35

-11

+

ZN

F5O

2-

14+

ZN

F5O

J9

+

F0

KIA

AI

143,

KIF

ISFD

KIM

I14

3.K

IFI5

KG

,;10

0590

10R

1K.

F111

5FG

11I0

059G

IOR

iK,

K,5

15F

0?1

3EM

42.

KIK

IS

TM

EM

42,T

GM

4T

ME

M42

.TG

M4

Tm

e,e3

2,T

g,4

7rr,

e,r4

2.T

gr4

Hum

anC

lust

er3.

4C

him

pan

zee

hch

r3

F0

QT

RT

DI

DF

O

OR

D3

ZN

F8O

ZB

TB

2O

Mo

use

pch

r3

KG

QT

RT

DI

DR

D3

L0C

4708

867

ZB

TB

2OB

5

-7

55

OF

O

F0

04P

43

LSA

MP

Rat

mchrl

6

F0

05141

DF

Drd

3

Zbt

b2O

B5

C

Doq

F0

GA

P43

LSA

0IP

rchrl

l

FQ

Q5

14

;O

FO

9,14

3

Zbt

b2O

B5

FGG

ap43

Lsa

ne

CC

0t7

7

F0

OT

RT

DI

DF

O

DR

D3

GG

ap43

LS

a,rv

FG

GA

P43

LSA

MP

Hum

anC

lust

er3.

5C

him

pan

zee

Mouse

Rat

Dog

hch

r3p

chr3

mch

r9r

chr8

Cch

r23

F0

PC

SC

R2

PL

SC

RI

DF

OF

0P

LS

CR

2.P

L$C

R1

F0

PIs

*r2

F0

Pls

cr2

KG

PL

SC

RZ

PL

SC

RI

PC

SC

R5

PL

SC

R5

PIsc

rSP

Isc,

5P

LS

CR

S

ZIC

44

.L

0C47

0956

-4

-Z

ic4

-4

Z,c

4-

4L

0C

485704

-4

ZIC

1-

4-L

QC

46Q

759

-4

,Zic

l-

4-Z

ici

.4

-L0C

611554

-4

*

FG

RP

L3

SP

I.4

0T

01

FG

RP

L3

8F

I,A

GT

RI

F0

Rp1

38p1

KG

Rp1

38p1

FG

RP

C3S

PI,

AG

TR

I

CP

BI

OP

SI

Cp

bl

Cpbl

CP

BI

118

Tad

epal

lyet

.

Page 137: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

Hum

anC

iust

er3.

6C

him

pan

zee

Mo

use

Rat

00

9

hch

r3p

ch

r3m

ch

r3rc

hr2

cchr3

4

F0

ZM

AT

3,PI

K3C

AD

FO

FGZ

MA

T3,

PIK

3CA

DF

OF

0Z

,rat

3,P

,k3c

sD

FO

FGZ

mal

3,P

,k3c

aD

FO

F0

ZM

AT

3.PI

K3C

AD

FO

KC

NM

B3

KC

NM

B3

KC

NM

B3

WIG

1-

2-

WtG

1-

2-

Wig

l-

2-

WIg

l-

2-

WIG

J-

2-

ZN

F63

9-

5*

ZN

F63

9-

•Z

tp63

9-

5Z

fp63

9-

5•

ZN

F63

9-

s

FO

MF

NI.

FG

MF

NI,

FG

MS

UF

GM

fr,l

FG

MF

NI.

0NB

4G

NB

4O

Fb4

0nb4

0512

4

Hum

anC

lust

er4.1

Chim

pan

zee

hch

r4

F0

L0

C7

372

530

FO

ZN

F59

5M

GC

2635

6L

0C

654254

ZN

F14

1F

0P

100

L0

C4

61

04

1.

AT

P5I

Mo

use

mC

0r5

K18

+

-6

+

Kil

+

pch

r4

F0L

0C

73

7253

DF

O

ZN

F59

5Z

NF

718

L0C

461

038

ZN

F721

FGP

10

0

L0C

4610

41,

AT

P5I

Rat

F0

L0C

7372

53D

FC

K16

K11

K28

rcn

n

F0

L0C

7372

53D

FO

00

gC

Cfl

r3

F0

Pig

g

A1p

5l

F0

L0C

7372

53D

FO

F0

Pigg

4155

/

F0

P10

0

L0C

46l0

41.

AT

F5I

Hum

anC

lust

er5.

7C

him

pan

zee

Mo

use

Rat

00

g

hch

r5pchr5

mch

rll

rchrf

Occhrl

l

F0.

CO

L23

AI,

MR

PL

5OP

3D

FD

50.

CO

L25

AI,

05

4D

FD

FG.C

o123

a1D

FO

F0,C

0123

41D

FO

F210007556

DF

O

CL

K4

Clk

4C

lk4

L0C

4746

45

ZN

F35

4AK

13-

ZN

F35

4AK

13-

Zfp

354a

K13

Ztp

354a

K13

*Z

N5

94

K35

-

ZN

F35

4BK

13*

ZN

F35

4BK

13•

Ztp

354b

K13

RG

D15

6008

05K

13-

L0C

4746

50K

24-

ZF

P2

-13,Z

NF

71

K-

*Z

fp2

-9

-RG

D156327

K-

ZN

F45

4K

125Z

FP

2K

il•Z

fp45

4K

12-Z

fp354c

Kil

-

DK

FZ

p686

E24

33K

13*

ZN

F45

4-

12,

9630

041

NO

7Rik

K13

-

ZN

F35

4CK

uZ

fp35

4cK

il-

F0

AO

AM

TS2

F0

0RM

6.A

DA

MT

S2F

0A

da,,5

s2P

0A

dam

ts2

F0

LO

C48

1453

L0C

391859

RuF

ylR

uFjl

L0C

4824

54

119

Tad

epal

lyet

Page 138: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

hch

r6p

chr6

DF

0FG

F0

M1

2l1

67

.F

66

08

3

RP

I-I5

30

14

3

L0C

3461

57Z

NF

184

mch

rl3

-9

*

K19

-

60P0

M12

1167

,F

6508

3

RP

I-l5

30l4

3

L0C

4722

31Z

NF

184

OF

O

F0

HIS

T(H

2AI

HIS

T1H

3I4M

IST

IH2k

I

K19

-

rch

rlO

PCP

om

12l6

2D

FC

Fks

583

Zfp

184

K19

*

Hum

anC

iust

er6.

7C

him

oan

zee

Mouse

Rat

Doa

F0

HIS

TIH

2AI

HIS

TIH

3HH

IST

IH2A

I

Hum

anC

lust

er6.

2

Cch

r35

F0

Pc,

m12

I62

DF

O

Ffs

g83

Z1p

184

K19

FGH

IST

IH2A

1

HIS

TIH

3HH

IST

IH2A

J

Ch

imo

anze

e

60.

P0M

121I

67.

FKSG

83

RP

I-l5

30l4

3

L0C

478746

DF

O

.9*

F0

66SF

!H2A

I

8IS

TIH

3H

HIS

TIH

2M

Mo

use

mchtl

3

F0

H!S

TIH

2A1

H!S

TIH

3HH

!ST

IH2A

J

Rat

rchrl

7p

chr6

F0

C0C

47l9

1O

L0C

4625

18

ZN

F16

5L

0C47

1912

L0C

4719

14Z

NF

193

ZN

F3O

7Z

NF3

O6

ZN

F96

ZN

1390

ZN

F4S

2Z

NF3

11

OF

hch

r6

FO

OR

2B

7P

0R

288P

DF

O

OR

IFI2

P

ZN

F16

5Z

NF

435

ZN

FJ9

2L

0C22

2701

ZN

F19

3Z

NF3

O7

ZN

F18

7Z

NF

323

ZN

F3O

6Z

N13

05Z

NF

452

ZN

F311

F0O

R2W

IOR

2F

!P

L0C

6462

60

Doq

se

S4

SK9

S5

5K7

-8

S6

57

sii

s-

K14

Cch

r35

F00161370

018-

1369

0(8-

4201

8-13

68

Zfp

96S

8

Zfp

306

s7

Ztp

187

-7

RP

23-2

98F

22.2

57

Zfp

lO2

5K9

DF

O

55

*

-7

.

57

,

518

S5

-

K14

-

oF

001613700181369

DF

0Ifr

4201

frI3

68

Zfp

192

Znt

307

Znf

187

Ztp

307

Ztp

96

DF

OF

0-L

00

61

1561

.

L0C

4883

12C

o*2y

2

C0C

6115

61L

0C48

831

6L

0C48

831

8

SKIS

*

5K3

-

5K7

*

S7*

-8.

515

*

F0

L0C

4719

26

LO

C3

7192

7

F0

0!fr

1366

0!fr

1365

0Ifr

1364

G0

!6l3

66

0161

365

3(8-

1364

F0

L0C

488327

L0C

4883

28

Hum

anC

lust

er6.

3C

him

pan

zee

Mouse

Rat

Dog

hch

r6p

chr6

mchrl

7r

chr2

OC

chrl

2

F0

W0R

46,

PF

DN

6D

FO

F0

W0R

46,

PF

DN

6D

FO

F0

Wdr

46,

Ff8

.6D

FO

F0

Wd,

46,

Ff5

.6D

FO

F0

W0R

46,

PF

ON

6D

FO

R0L

2T

AP

BP

180L

2.T

APB

PR

gf8T

spbp

RgI

2,T

apbp

RG

L2,

TA

PBP

ZN

F29

7B

2-

ZN

F29

7B

2-

Zbt

b22

B3

*Z

b1b2

2B

3*

L0C

6079

00B

2-

ZB

TB

9B

1.Z

BT

B9

B1

•Zb

tb9

B9

-Zb

tb9

B9

-L

0C60

7940

B1

*

F0

0817

1,ff

PR

3FG

BA

KI.

FF

53F

008k;.

15x3

F0

888-

I.15

x3F

GB

AK

I,IT

PR3

120

Tad

epal

tyet

Page 139: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

hch

r7

D

F0

RA

dD

AO

LS

K0E

LK

2,0A

1021P

DK

FZ

p434

J101

5D

KF

Zp547k054

L0C

442283

ZN

F32

5

F0

0R7E

33.

0576136F

0R7E

59P

pch

r7

F0

RA

CI,

DA

0LB

K0E

LK

2,0

51

02

1F

L0C

7422

63L

0C47

2281

ZN

F32

S

F0

0R7E

39,

0576

136F

0576

59F

mch

r5

F0.

Dlb

Ori

d2;p

Gm

792

Ztp

316

Ztp

l2

F0

0,7

e39

,0

r7e1

36

p

0r7e

59p

Dj

Hu

man

Clu

ster

7.7

Ch

imp

anze

eM

ouse

Rat

Dog

hch

r7p

chr7

mch

r5r

chrl

2C

chr2

3

F0

5CC

29A

4,D

FO

F0

SLC

29A

4,D

FO

F0

SIc

29a4

DF

OFG

Slc

29a4

DF

OF

0SL

C29

A4,

DF

O

KIA

A16

56K

1AA

1856

KIA

A78

56K

1M18

56K

1MI8

56

L0

C4

41

19

3-

-L

0C44

1193

ZN

F81

5K

3*

ZN

F8

5K

3*

F0

PM

S2.

JWI

F0

PMS2

.JO

lIF

0P

se2.

J6I

F0

Pm

s2,

lOI

F0

P010

2.JO

lI

EIF

2AK

IE

IF2A

KI

E7F

2kI

0136

51E

1F2A

KI

Hum

anC

lust

er7.

2C

him

pan

zee

DF

0

Mou

se

DF

0

-5

K15

K15

+ +

Rat

rch

rf2

DF

C

K15

K15

K15

Doq

DF

0

K15

-

K15

*

CC

ht6

F0.

RA

CID

AG

LB

KD

EL

K2.

05

10

21

F

F0

Ra*

l.D

36’b

Ord

2p

flG

0156

3095

-

Zfp

316

-

flG

D15

6439

6K

DF

O

15 15

F0

0r7e

39,

0r7e

136p

0r7e

59p

F0

05

7E39

,05

7E13

6P

0576

59P

Hu

man

cllu

ster

1.3

urn

mpan

zee

Mouse

Hal

uog

hch

r7p

chr7

F0.

CO

L23

AI

MR

PL

5QP

3D

FO

F0

CO

L23

AI.

MR

PLSJ

P3D

FO

CL

K4

CO

Ol

L0C

442311

-12

+Z

NF

479

K10

-

L0C

222032

--

-Z

NF

716

K12

+

ZN

F47

9K

10-

L0C

340223

--

-

ZN

F71

6K

12+

F0

HIS

TIH

2AI

F0

HIS

TIH

2AI

HIS

TIH

3H,H

IST

IH2M

HIS

TIH

3H.H

IST

IH2A

J

121

Tad

epal

lyet

Page 140: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

t,

‘tQot(t

F-

C”C”

Page 141: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

D

Hu

man

Ciu

ster

8.1

Chim

pan

ree

Mo

use

Rat

Dog

hch

rlp

ch

rlm

ch

rll

rchrl

Occhrl

6

F0

DE

FB

I3O

DF

0FG

DE

FB

I3O

DF

0F

0E

0654465

DF

OF

00e

fo41

DF

0F

0D

EF

BI3

OD

F0

L0C

389631

--

L0C

441

341

--

-

F0R

PL

38P

I.A

GT

RI

F0

RP

L3

BP

I.A

GT

RI

FGA

gt,I

FGA

gIO

FO

RP

L3

8P

I,A

GT

RI

CP

BI

CF

BI

Cpb

IC

pbI

CF

BI

Hu

man

Clu

ster

8.2

Chim

pan

zee

pcf

lr8

hch

r8

F0

SN

TB

I.H

AS2

HA

SNT

.M

OP

S36

P3

ZH

X2

ZH

X1

DF

0

Mouse

mchrl

5

HI,

HI-

F0

SN

TB

I.H

A$2

HA

SNT

,M

RP

S36

P3

ZH

X2

ZHX

1

DF

O

F0

AT

AD

2

Rat

HI

Hi

F0

S,t

bI,

H3s

2

M,g

s36p

3

Zhx

2H

I+

Zhx

lH

I-

F0

AT

AD

2

Doq

rch

r7

F0S

,,tb

l.H

a12

Mps3

6p3

Zhx

2Z

hxl

F0

.40

42

CC

ht7

3

F0

SN

TB

I,H

AS2

HA

$NT

MR

PS

36P

3

L0C

4820

33H

I

L0C

4750

89H

1

HI

H1

OF

O

F0

A0d2

F0

AT

AD

2

Hu

man

Clu

ster

8.3

Ch

imp

anze

eM

ouse

Rat

00g

hch

r8p

ch

r8m

ch

rl5

rch

r7cchrl

3

F0

L6E

,H

HC

MD

FO

FG,C

Y6E

,H

HC

MD

FO

F0.

Ly6,

HF+

mD

FO

F0.

ty6F

HI+

mD

FO

F0

ty6F

DF

O

LY

6HL

Y6H

Ly6

hL

y6h

Lyg

h

ZF

P41

-4

*G

LI4

-7

Zfp

4l-

4*

Zfp

4l-

4*

GL

I4-

7*

ZN

F62

3-

13*

Ztp

623

-13

*L

0C55

0893

-7

*

ZN

F69

6-

9Z

NP7

O7

K•

Zfp

707

K7

*

ZN

FB

23-

13

ZN

F7O

7R

7-

F0.

BR

EA

2,M

4P

K5

FGB

RE

A2.

MA

PK

I5F

0.&

e42.

Mp

k1

5F

0B

rea2

KO

pkI5

F0

Br*

a2.

MpkI5

F%M

S3H

FAM

43H

Fa*1

834

Fam

83h

Fa,

r.83

h

123

Tad

epal

lyet

Page 142: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

Hum

anC

k,s

t.r8

.4C

him

pan

zee

Mouse

Rat

Dog

hch

r8p

ch

r8m

chrl

5rc

hr7

cchrl

3

FO-R

EC

OL

4,L

RR

CI4

.D

FO

F0

REC

OL4

,CR

RC

I4,

QF

OF

0R

KFO

(4,L

1121

4D

FO

F0

ReF

qI4

,U,F

l3D

FO

F9w

cu

,r0cs2

,.s

DF

O

CR

RC

24IC

IM16

88L

AR

C24

,KIM

1688

L,r

c24

-,,c

24.0

5132

193

ZN

F251

-7

-L

0C74

2563

K14

Ztp

251

K12

-Z

nt25

1K

12-

L0C

4751

29K

15-

ZN

F34

K12

-Z

NF

3412

-Z

tp7

K16

*Z

fp64

7K

13-

L0C

4821

06K

15-

ZN

F51

7K

10*

ZN

F7K

14*

Ztp

647

K13

L0C

4821

07K

12

ZN

F7

K14

*Z

NF

250

K13

-Z

NF

347

K20

ZN

F64

7K

13Z

NF

16-

ZN

F16

-17

-L

0C74

7863

L0C

642914

K10

-

F0

TM

ED

IOP

.C80

,177

FGL

œ4

&4

47

8F

0-11

1003

8F14

R1k

F0

1110

038F

14R

1kFG

LO

CIO

OIO

LO

C.1

0211

0

C80

r133

LO

C46

4479

Trr

*d1O

PT

rrd

1Q

PL

0C48

2111

Hum

anC

lust

er9.

1C

him

pan

zee

pch

r9h

chr9

*011

5017

03,

DF

C

9LC

35D

2

ZN

F36

7Z

NF5

1ON

F7

82

ZN

F32

2B

Mou

se

DF

O

-2-

K10

-

K14

—11

F0H

SD

1783

,

SL

C35

02

ZN

F36

7-

2

ZN

F51O

K16

L0C

7425

64K

13-

ZN

F32

2B-

11

Rat

tch

tl7

Inchrl

3

*0.1

1*91

763

DF

O

Sf93

542

fp367

-2

G61

4115

20T

0007

59001.

NC

BP

I

Doq

596

1M

525

TDR

D7

TMO

O1.

NC

BFI

F0.

Hs9

l703

DF

C

Sf53

542

Ztp

367

Cco

n

5011

5017

03,

SL

C35

02

L0C

476295

-2

23

-

DF

O

3-0

Tdr

d7

Imo

di,

Ncb

pl

F0

Td,

07

TO

OdI

.N

cbpt

F0

Tdr

d7

T159

d1,N

bpi

Hum

anU

lust

et0.2

L.n

impa

nzee

Mo

use

uo

g

hch

rgp

chr9

mch

r4rc

hr5

Cchrl

5

F0

FCM

D,

TA

C2

DF

OF

0FC

MD

,T

AC

2D

FO

F0-

Fcr

,j,

Tac

2D

FO

F0

Fc,1

11,

Tac

2D

FO

FGFC

MD

,T

AC

2D

FO

TM

EM

350

TM

EM

380

Trr

,n3

8b

Trn

sos3

8bT

ME

M38

B

ZN

F46

2-

o*

ZN

F46

2-

o•

Ztp

462

-o

*Z

fp46

2-

o*

L0C

4816

55-

o

KL

F4-

3-

L0C

4646

40-

3-

K1t

4-

3-

K11

4-

3-

L0C

4816

57-

2-

F0

AC

TL

7B,

AC

TL

7AF

0A

CT

L7B

,A

CT

L7A

F0

Act

ieF

0A

ctf l

bF

0.A

CT

L7B

,A

OT

L7A

IKB

KA

PK

OK

AP

fkbk

apfk

bk,a

pIK

BK

AP

124

Tad

epal

fyet

.

Page 143: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

Hu

man

Clu

ster

9.3

Ch

imp

anze

eM

ouse

Rat

Dog

hch

rgp

ch

r9m

chr4

rchr5

cchrl

l

F0

SNX

3Q,

TS

CO

TD

FO

F0

SNX

3OT

SC

OT

DF

OF

0S

,,430

Tsc

otD

FO

F0

54x3

0.T

scot

DF

OF

0SN

X3O

,T

SC

OT

DF

O

L0

C169834

-13

-

ZF

P3

74

12

-ZF

P3

7K

12

-Zfp

37

K12-Z

1p37

K12-

F0

SL

C3

lAS

.F

KB

PI5

F0

5LC

31A

2F

KS

PI5

FGS

4ç31

x2F

kbpl

5F

0.S

lc3la

2F

kbpl5

F0

SL

C3I

A2.

FK

BP

I5

SL

C3

lAI.

C0

02

6S

LC

3IA

I,C

DC

26C

dc26

Cdx

26S

LC

3IA

I,C

DC

26

Hum

anC

lust

ec94

Ch

imp

anze

eM

ou

seR

atD

og

hch

r9p

chr9

mch

,2r

chr3

Cch

r9

FG

POO

L,

RC

3H2

DF

OF

0PO

OL

,R

C3H

2D

FO

F0

P40

.R

c3h2

DF

OF

0Pd

cl,

Rc3

h2D

FO

F0

POO

L,

RC

3H2

DF

O

ZN

F48

264

•ZN

F4

82

B4

-Z1

p4

82

64

.Ztp

48

2B

4-Z

NF

482

84

-

ZB

TB

2G64

-ZB

TB

26

84

-Zbtb

26

63

-Zbtb

26

64

-ZB

TB

26

64

-

F0

RA

B0A

PI,

0PR

21F

0R

AB

OA

PI,

GP

R2I

F0

Rab

gap

l,G

pr2l

F0

Rab

gap

l,0p

r21

F0

RA

BO

AP

I,0P

R21

Hum

anC

lust

er70

.1C

him

pan

zee

Mouse

Rat

Dog

hchrl

ûp

ch

rlo

KG

L0C

6463

52,A

NK

RO

3QA

OF

OKG

LOC

6-IO

352A

NK

RQ

3OA

DF

O

L0

C219752

L0

02

19

75

2

ZN

F248

K8

-Z

NF

248

K8

-

BA

775A

3.1

-,

L0C

466277

K12

-

BA

393J1

6.4

*Z

NF

33A

K16

ZN

F25

612

-Z

NF

37A

K12

ZN

F33

AK

16*

ZN

F37

AK

12

F0

L0C

646419

F0L

0C

646419

L0C

646423,L

0C

646426

LOC

6464

23,L

00646426

125

Tad

epal

lyet

Page 144: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

hch

rl0

pch

rfo

DF

0F0

.C

CN

FL2

MG

CI6

291

LO

C4O

164

2Z

NF

37B

ZN

11

B

DF

C

-16

-

K8

-

K16

-

F0.

CC

NY

U

MG

C16

291

L0C

4504

11-

--

L0C

7401

09-

8-

F0

DU

XA

P3

6IS

1.

RfT

Hum

anC

lust

er70

.2C

him

van

zee

Mouse

Rat

Doq

F0

DU

XA

P3

8115

1.R

OI

Hum

anC

lust

er70

.3

hch

rl0

D

Ch

imo

anze

e

pchrl

o

DF

0

Mo

use

FGG

AL

NA

CT

’PR

ASO

EFI

A

FX

YD

4,H

NR

PF

ZN

F48

7K

3

ZN

F23

9-

9

ZN

F48

5K

11*

ZN

F32

7-

mch

r6

DF

CF

0G

AL

NA

CT-

2.R

ASG

EFI

A

FX

YD

4,H

NR

PF

L0C

7453

36Z

NF

239

L0C

7454

99Z

NF

32

Rat

F0,

HN

RPA

3PI

CX

CL

I2

F0’R

AL

g1t1

A,F

.AA

DF

O

Hr,r

pf

Zfp

239

.7

Ztp

637

-7

K-*

-9

-

K11

*

-7

-

Doq

tchr4

0.

RA

SLI1

Ia.

RA

ySA

DF

O

lIlr

pI

Ztp

637

-7

F0.

HN

RPA

3PI

CX

CL

I2

CC

fl2

5

F0.0

%L

NA

CT

-2,

RA

5GE

F*

DF

C

FX

YD

4,H

NR

PF

F0.’

H,I

Ipa3

pI

Cxc

II2

GH

nrpa

3pl

xc11

2

F0.

HN

RPA

3PI

CX

CL

I2

Hum

ançl

usc

er1

7.1

cnim

pan

zee

Mouse

Hat

UO

g

hchrl

lp

ch

rll

mch

r7tc

hrl

ccfl

r2l

F0

OR

IOA

2O

R2D

2D

FD

F0

0R

10

A2

0R

2D

2D

FO

f0O

rIO

a2D

FO

F0

0r1

0a2

DF

OF

00

R1

0A

2.0

R2

02

DF

O

0R2D

3.O

RID

A4

0R

203

OR

IDA

40

r2d

20r2

52

0R2D

3O

RID

A4

ZN

F21

56K

4*

ZN

F21

55K

4L

0C48

5362

-4

*

ZN

F21

46

ii-

ZN

F21

4K

11.

L0C

4853

63K

11-

F0

NL

KP

I4,

HN

RN

PG

-TFG

NL

KP

I4,

HN

RN

PG

-TF

0N

IkpI

4FG

NIk

pI4

F0’N

LK

PI4

,H

NR

NP

O-T

SYT

9$Y

T9

SyI9

Syt

sSY

T9

126

Tad

epal

lyet

.

Page 145: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

D

Hum

anC

lust

er17

.2C

him

pan

zee

Mo

use

Rat

Dog

hch

rlpchrl

mchrl

lrc

hrl

Occhrl

6

FG0R

5B12

,0R

5821

DF

OF

G0

R5

B1

2,

0R5B

21D

FO

F0

0r5

b12

DF

OF

G0r5

b12

DF

OF

Gr0

R58

12,

0R

5827

DF

O

LPX

NL

PXN

Lpx

nL

pL

PXN

ZF

P91

-4

•ZF

P91

-4

.Zfp

9l

-4

+Z

fp9l

-4

•L0C

475962

-4

*

ZF

P91

-CN

TF

--

*

F00

CY

AT

FG

0LY

AT

FG

0Iy

atF

0O

yal

FG

0LY

AT

CL

VATL

20L

YA

TL

2G

frat

/2G

fraf

l20L

YA

TL

2

Hu

man

Clu

ster

72.7

hch

rl2

Ch

imo

anze

e

pchrl

2

F0

L0

C4

02

39

l

Mouse

mch

r5

DF

CF

0C

HFR

ZN

F6O

5Z

NF

26Z

NFB

4Z

N11

40L

0C44

0122

ZN

F1O

ZN

F26

8F

0E

*d0

fC

h,om

som

e

DF

0

K17

-

K13

,

K19

+

K10

+

F6-

K11

-24

FC

Chf

r

ZN

F26

ZN

F84

ZN

F14O

ZN

F26B

Rat

rcf

lrl2

DF

0

K13

-

K19

*

K21

*

K24

*

GC

hfr

Doq

DF

0

FG

E*d

0fC

hro

no,e

Hu

man

Clu

ster

73.1

Ccf

lr5

F0C

HF

R

L0C

4862

20K

17-

L0C

4862

19K

13-

DF

C

ncnrl

3

F0

End

0f

Chrn

r,om

e

Ch

imp

anze

e

pcf

lcl3

GE

,d0fC

hro

,rnnn,e

Mo

use

F0

OA

CH

ID

FC

0/2

3

KL

F5

-3

KL

F12

-3

-

F0

OC

O-

172.

BIM

DK

TO

C1b

4

mcf

lrl4

F0

DA

CH

1

0/0

3

KL

F5K

LP1

2FG

GC

G-1

72.

BIM

D6

TOC

1b4

F0

End

0f

Ch,

onn,

omn

DF

C

.3

.

-3.

Rat

rchrl

5

Dog

FO

Dac

hi

DF

OO

is3

Kif

s-

3*

1<11

12-

3-

F0

Gcg

-/7

2.O

nrdG

Obc

lb4

Cch

r22

‘00*

011

DF

31,3

(1f5

-3

KI1

12-

3

‘0O

cg-

I72,

Bin

dK

TOc

1b4

CF

0D

AC

HI

0/0

3

KL

F5K

LF1

2F

0G

C0-

/72.

8/04

06

TB

CIb

4

DF

0

-3

*

-3-

Hu

man

çlu

ster

9J.

zunim

pan

zee

Mo

use

Har

tiog

hchrl

3p

ch

rl3

mch

rl4

rchrl

5cch,2

F0

TM

9SF2

,C

LV

OL

DF

OF

0T

M9S

F2.

CL

VO

LD

FO

F0:

T,n

95f2

,C

lybi

DF

OF

0T

,n9s

f2,

Cly

biD

FO

F0

TM

9$F2

,C

LVO

LD

FO

ZIC

5-

4-

ZiC

5-

4-

ZC

5-

4-

Zic

5-

4-

ZiC

5-

4-

ZIC

2-

4*

Z1C

2-

4*

Z1c

2-

4+

Zic

2-

4*

ZIC

2-

4*

F0

PCC

A,

RP

S26

LF

0PC

CA

,R

FS

26L

F0

Pcc

a,R

pn26

1F

0P

cca,

Rp4

261

F0:

PC

CA

RP

S26

L

TM

TC

4T

MT

C1

Tm

tc4

To/4

n4T

MT

C4

127

Tad

epau

yet

.

Page 146: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

D

Hum

anC

lust

er74

.1C

him

pan

zee

Mo

use

Rat

Dog

hchrl

4pchrl

4m

ch

rl4

rch

rl5

cchr8

F0

MY

H6,

MY

H7,

N0D

ND

FO

F0

MY

HG

,MY

H7,

NG

DN

DF

OF

0M

yht,

Myh7

DF

OF

0M

yhF

Myh

7D

FO

F0.

MY

H6,

MY

H7N

0DN

DF

O

Ngd

Ngd

ZFH

X2

H4

-ZF

HX

2H

4.Z

fhx2

H4

-.Z

fhx2

H4

-L0C

490613

H4

-

ZN

F4O

9-

1Z

NF4

O9

-1

F0

TH

TPA

,A

PIG

2F

0T

HT

PAA

PIO

2FG

Thtp

a.%

,1g2

F0

Tht

pa,

Ap1

42F

0T

HT

PAA

P10

2

aPI-

14JP

I-14

Jph4

Jph4

JFH

4

Hum

anC

lust

er74

.2C

him

pan

zee

Mo

use

Rat

Dog

bchrl

4pchrl

4m

chrf

l2rc

hr6

cchr8

F0

ESR

2.M

TH

FD

ID

FO

F0

ESR

2.M

TH

FO

ID

FO

F0-

Es,

2,M

IhI-

diD

FD

F0

Esr

2,M

CI-

dlD

FO

F0

ES

R2,

MTH

FO

ID

FD

AK

AFS

AK

AP5

Ak4

p5A

kap5

AK

AP5

ZB

TB

25B

2-

ZB

TB

25e

2-

Zbtb

25

B2

-Z

btb2

5B

2L

0C

490730

B2

-

ZB

TB

1B

2•Z

BT

BJ

B2

,Zbtb

lB

2,Z

btb

lB

2•L

0C

490731

B2

F0

HS

PA

2.N

UP

5ÛP

IFG

HS

PA

2.N

UP

5OP

IF

0H

spa2

Nup

SO

pIFG

Hsp

a2N

upSO

pIF

0H

SP

A2,

NU

P5O

PI

PL

EK

H03

,S

PT

BP

LE

KH

03,

SPT

BSp

IbSp

ISPL

EK

HG

3.SP

TB

Hum

anC

lust

er75

.7C

him

pan

zee

Mo

use

Rat

Dog

hchrl

pch

rlm

ch

rll

rchrl

ûcchrl

6

F0

WH

DC

I.H

QM

ER

2D

FO

F0

WH

DC

I.H

QM

ER

2D

FO

F0

Whd

zI.

Hom

e,2

DF

OF

0-W

hdcl

.H

o,,l

e,2

DF

DF

0I-

W-1

001.

HO

ME

R2

DF

D

FAPA

IO3A

IFA

MIO

3AI

F4,,,

103A

1F

amIO

3AI

FAM

IO3A

I

BT

BD

1B

..

BT

BD

1B

--

Btb

dl

B-

-B

tbdl

B-

-

BN

C1

-3

-B

NC

1-

3B

ncl

-s

.B

ncl

-3

-

F0

SH

30L

3,A

OA

MT

SL3

F0

SH

30L

3.A

DA

MT

SC3

F0,

Sh3g

13,

Ada

F20

3F

0,S

h3pI

-3,

Ada

mIs

I3F

0SH

3OL

3.A

OA

MT

SC3

Hum

anC

lust

er15

.2C

him

pan

zee

Mo

use

Rat

00g

hchrl

5h

ch

rfS

mchr7

rchrl

cchr3

F0.

t0C

34

03

C2

L0

C3

84

16

3D

FO

F0

C1-

1030

2,C

0C35

463

DF

OF

0C

OO

4IO

IC2C

OC

3&51

63D

FO

F0

C0C

Uc2

L0c3

88I5

3D

PD

DF

O

FL

]40113.E

2Q

2F

LJ4

0II

3F

202

F13

4011

3.F

202

FL

1401

l3,

5202

FL

J4O

II3E

202

ZF

P29

s14

•Z

SC

AN

2s

14Z

scan

2S

14L

0C68

6118

s-

-L

0C48

8749

S21

SC

AN

D2

s-

•S

CA

ND

2-

..

Zfp

592

-4

Ztp

592

-4

-L

0C48

8751

-4

-

ZN

F59

2.

4-

L0C

4536

08F

0A

LPK

3.SI

-020

41.

F0

AL

PF3.

0C02

04I-

,F

0A

LPK

3,dL

0204

1.F

0A

LPK

3,SL

C2O

A1.

F0

ALP

K3

SLC

2OA

I,

PDEO

AP

0524

FDEO

AP

0004

PO

E8A

128

Tad

epal

lyet

Page 147: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

D

Hu

man

Clu

ster

75.3

Ch

imp

anze

eM

ouse

Rat

Dog

hchrf

5pchrJ

5m

chr7

rch

rlcchr3

F0

ME

SP

2,

AN

PE

PD

FO

F01

1050

2.A

NPE

PD

FO

F0.M

ESP

2,A

1PE

PD

FO

F0

ME

SP

2,A

NP

EP

DF

OFG

ME

SP2.

AN

000

DF

O

AP3

S2A

P3S2

AP3

52A

P3S2

AP3

52

L0

C3

90

63

6-

2*

ZN

F71O

-11

*

ZN

F77

4-

12*

F0

ID0A

PI.

CR

TC

3F

0IQ

OA

PI

CR

TC

3F

0IQ

OA

PI.

C5T

C3

F0

IDG

AP1

,C

RTC

3F

OJD

GA

PI,

CR

TC

3

OLM

.FU

RIN

8L11

,FU

R1N

OU

FF

OR

tSB

LM.

PUR

iNO

UI

FU

FIS

Hu

man

Ciu

ster

16.1

D

Ch

imp

anze

e

pC

ht7

6

Mouse

mC

hr77

ncn

ri&

F0

TH

OC

SM

MP

25D

FO

MM

PL1.0

32

ZN

F2O

6s

14-

ZN

F2O

5K

8

ZN

F21

36K

5

ZN

F20

0-

5

ZN

F26

39K

9

ZN

P75

AK

5,

ZN

F43

4s

6-

ZN

F17

4S

3

ZN

F59

7-

7

F0

FL

JI.i

lStL

OC

UiC

I7.1

L0C

35

02

21

.CL

UA

P!

00

03

Rat

F0

TH

OC

6,M

MP

25D

FO

MM

PL

I,iL

32

L0C

7476

775

14

ZN

F2O

5K

8

ZN

F21

3so

5

ZN

F20

0.

5-

ZN

F26

356

9+

L0C

4538

615K

5

ZN

F43

45

6-

ZN

F17

45

3*

L0C

4678

89-

7

F0.

FU

l315

4.t0

C64

671

t0C

390671.C

LU

AP

I.N

OD

3

Doq

F0

iSO

iS,

Mm

p2S

DF

O

514

K8-

SIS

5

K19

-

K-+

K11

*

K13

t-

K10

*

rC

tJrl

U

*O.F

hoK

6,M

mp2

5D

FD

Ztp

206

514

*

Zfp

l3-

8

Znt

213

55

Zfp

263

s9

-

nh

17

4s

-

fp5

97

7*

Zip

206

Ztp

l3Z

tp21

3Z

fp4D

L0C

5451

91L

0C43

3078

1300

0038

13R

ik

Ztp

758

6330

4162

0F

0C

iuap

l

10*0

3

Cch

r6

F0

tOC

l900

*7

L0C

4900

46

L0C

4900

45L

0C47

9870

L0C

4900

40L

0C49

0042

L0C

4900

35

DP

O

s15

K13

-

SK9

ss.

06

-

‘GC

luap

i

5*03

F0

L0C

6094

56

L00175

5,L

OC

.t57

733

Hum

anC

lust

er76

.2C

him

pan

zee

Mouse

Rat

Dog

flchrl

6p

ch

rl6

mch

rl6

rchrl

OC

chr2

O

FG

AD

CF

9.SR

LD

FO

FGA

DC

YF

SAL

DF

DF

0A

dcy9

.5*

]D

FO

F0

Adc

y9,

54D

FO

FGA

DC

Y9.

SRL

DF

O

TFA

P4T

FAP4

lïap

4T

(ap4

TFA

P4

GL

IS2

-4

GL

IS2

-4

-G

11s2

GIi

s2_p

red

L0C

4900

28

ZN

P50

0s

5-

ZN

F50

0s

s-

F0

SE

PT

I2.R

OO

DI

FGS

EP

TI2

,R000i

F0

S6p1

12F

0S

eptl

2F

0S

EP

TI2

,R000i

Rag

d,R

ogdi

129

Tad

epal

lyet

Page 148: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

D

Hum

anC

iust

er76

.3C

htm

pan

zee

Mouse

Rat

Dog

hchrl

6p

ch

rl6

mchr7

rchrl

cchr6

FO

CD

2BP

2,T

BC

IDIO

BD

FO

FG

CD

2SP

2,T

BC

1DIO

BD

F0

F0C

d2bp2,T

bcl

dlO

bD

FO

FG

Cd2bp2,l

bcl

dlO

bD

F0

FGC

D2B

P2,

TB

CID

IOB

DF

D

MY

LP

F.S

EP

TI

MY

LP

FS

EP

TI

MyI

pSep

t1M

ylpf

Sep

tlM

YL

PF,S

EPT

I

ZN

F55

3-

12L

0C74

0214

-12

*Z

tp55

3-

12R

GD

1561

639

L0C

4899

01-

12-

ZN

F76B

10-

L0C

4640

44-

8•

Zfp

771

-8

*R

GD

1305

903

L0C

489906

K17

*

ZN

F74

7K

--

L0C

4543

67K

14-

Zfp

768

-w

-L

0C69

1885

L0C

489908

K2

*

ZN

F76

4K

7.

C0C

4540

46K

7.

643O

6O4K

15R

1kK

-L

0C69

1887

L0C

6075

01K

-*

ZN

F68B

K2

L0C

4679

502K

991

3001

9022

Rik

K13

Zfp

6BO

L0C

489910

K11

ZN

F18

5K

7.

L0C

4679

51.

--

E43

0018

]23R

ik.

g.

Hit3

9L

0C48

9911

K9

*

ZN

F68

9K

11Z

NF

629

-19

.Zf

p764

K9

.Z

tp62

9L

0C48

9914

-19

ZN

F62

919

-L

0C46

7958

.15

.Z

tp68

8.

-Z

fp66

8L

0C48

9920

-16

ZN

F66

8.

w.

ZN

F64

6-

29-

Zfp

689

K2

.L

0C48

9922

-28

ZN

F64

629

*Z

tp62

9.

19-

Zfp

668

-16

6820

420M

01.

27

F0

VK

OR

CIL

0C

64

70

97

F0

L004679598C

K0K

F0

Vko,C

l8*kdk

F0

Vko

,C1B

ckdk

BC

KD

KO

IYS

TI.

FR

SS

8F

R5

28

My

sIl,

Prs

s8M

ystl

,Frs

sB

Hum

anC

lust

er76

.4

hch

rl6

Chim

anzee

pch

tl6

DF

0F

0P

PP

2C

BP

VN

IR3

L0C

342426

ZN

F26

7F

0L

0C64

7126

.

L0

C3

88

24

8

Mou

sem

cOrS

.12

K13

*

ES

FP

P2C

BP

VN

IR3

C0C

7432

74Z

NF

267

F0

L0C

6471

26,

L0C

3882

48

DF

O

.12

K14

Rat

rcO

n

DF

0F

0P

pp2c

bp

011*

3

Ztp

267

F0

L0C

6471

26.

L0C

3882

48

Doq

K14

*

F0

Fpp

2cbp

DF

O

V,1

1r3

Zfp

267

K14

F0

PF

P2C

BP

VN

1R3

DF

O

*GL

0C64

7126

,

L0C

3882

38

F0

C0C

647l2

6.

L0C

3882

48

Hum

anC

lust

er76

.5C

him

pan

zee

Mouse

Rat

Dog

hchrl

6p

ch

rl6

mch

r8rc

hrl

9cch

r5

F0A

BB

AI,

VA

C14

DF

OF

OA

BB

AI,

VA

CI4

DF

DF

0A

bbal

,V

2c14

DF

OF

0A

bbal

,V

aC14

DF

QF

OA

BB

AI,

VA

CI4

DF

O

HY

DIN

,C

AL

B2

HY

DIN

,C

AL

B2

Hyd

ftl,

CaI

b2H

ydrn

,C

aIb2

HY

DIN

,C

AL

B2

ZN

F23

K17

.L

0C46

8017

K17

-Z

tp61

2K

16*

Zfp

612_

pred

K16

*L

0C

489720

K17

-

ZN

F19

K10

.Z

NF

19K

10L

0C48

9721

K10

FG

-CH

ST

4.T

AT

FO

CH

ST

4,T

AT

F0-C

hst

4F

0C

hst4

FO

CH

ST

4.T

AT

MA

RV

EL

D3

MA

RV

EL

Q3

Tat,

Ma*

,&d3

Tat

,M

a,*&

d3M

AR

VE

LD

3

130

Tad

epal

lyet

Page 149: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

DH

um

anC

lust

ar76

.6C

him

pan

zee

Mou

seR

atD

og

hchrl

6p

ch

rl6

mch

r8rc

hrl

9cchr5

F0

SL

C7A

5CA

5AD

F0

F0

SLC

7AS

CA

OA

DF

DF

0S

Ic7a

5.C

a5a

DF

0F

0S

Ic7a

5C

aSa

DF

0F

0C

A5A

DF

0

eN

P840F

Bae

p8

a,,

BA

NF

ZN

F64

9-

3Z

NF

649

.3

eG

m22

4-

L0C

6914

99-

4L

0C48

9666

-4

-

ZFP

M1

-2

.-Z

FP

M1

-2

*Z

fpM

l-

2eL

OC

6915

O4

-2

*

FG

NH

NI,

ILI7

CF

GN

HN

I,1L

17C

F0

Nhel,

t117

cF

0N

hel,

1117

*F

0N

HN

I,L

I7C

CY

64C

YB

AC

yba

Cyb

aC

YB

A

Hum

anC

lust

er77

.7

hch

rl7

Ch

imp

anze

e

F0K

?F1C

0P

R1728

D

pchrf

7

Fo

Mo

use

FG-K

IF1C

0P

R1

72

8D

F

mchrl

i

ZF

P3

ZN

F23

2Z

NF

594

F0

UN

0578

3,R

AB

EP

1

NU

P88

-3

,

S3-

-22

-

o

I’ta

t

L0C

4684

55L

0C45

5260

ZN

F59

4F

0U

N05

783R

AB

EP

I

NU

F8S

FG

KS

IcD

FC

Gpr

1728

Z1p

3-

13-

13

55

-32

-

Hum

anC

lust

er77

.2

Doo

hch

rl7

rch

riO

F0

641e

Gpr

l72

8

RG

D1

5658

81-

13e

DF

0

cch

r5

FG

KIF

IC.

0P

R1

72

8

F0

64q5

783

Nue

88

Ch

imn

anze

e

pchrl

7

DF

C

DF

O

U80

WP

V2

SN

OR

D49

B

ZN

F28

7Z

NF

624

FG

RN

AS

EH

IP2

L0C

4894

52-

13

F0

Ue4

578

3

64r,

88

Mouse

mch

rll

SK14

-

K21

-

U8B

TR

PV

2

5N

0R

0498

ZN

F28

7Z

NF

624

F0

RN

AS

EH

IP2

F0

UN

0578

3.R

AB

EP

I

NU

P88

OF

D

SK14

-

K21

-

Rat

D80

F0

LO

b

Trp

e2

Ztp

287

F0

Rea

s&al

p2

Doq

rchrl

O

F0

LOb

Tp

e2

Ztp

287

0R

nas

ehfp

2

DF

0

Cch

r5

U8B

TR

PV

2

SN

OR

D49

B

ZN

F28

7

F0

RN

AS

EH

IP2

DF

O

Hu

man

Clu

ster

78.1

Ch

lmp

anze

eM

ouse

Rat

Dog

hchrl

8p

ch

rl8

mchrl

8rc

hrl

8cchr7

F6

N0L

3.O

TN

AD

FO

FG

NO

L4O

TN

AD

FO

F6

1641

64*

DF

OF

6N

e*.

01*3

DF

OF6

.L

OC

ISO

I9I

DF

O

MA

PR

E2

MA

PRE

2M

apre

2M

apre

210

*190

13*

IOC

*901

89

ZN

F39

7s

-•

C0C

4553

70s

9*

Ztp

397

s9

Znt

397

S-

-L

0C49

0486

S7

*

ZN

27

1-

s.

L0C

4553

72s

7.

C23

0097

124R

1ks

--

2fp2

39-

18*

L0C

4801

62s

4*

ZN

F24

s-

L0C

4685

20-

20*

Zfp

35-

18*

Ztp

l9l

S4

-

ZN

F39

6S

2-

ZN

F24

S4

Zfp

l9l

S4

ZN

F39

6s

2-

Fo

GA

crIr

,.L

0C63

8532

F6

64

LNT1

,L

0C53

0532

F0

0*1,

11,

FGG

alet

IFG

L0C

48*1

6l

PI5

RS

P15

85P1

501

PI5

rst0

0480

160

131

Tad

epal

lyet

Page 150: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DH

um

anC

lust

er78

.2C

him

pan

zee

Mouse

Rat

Dog

hchrl

8pchrl

8m

ch

rl8

mch

rl8

cch

rl

F0

TS

HZ

I.L

0C28

1274

DF

OF

0T

SH

ZI.

L0C

2842

74D

FO

F0

T552

1D

FO

F0T

shzl

DF

OF

0T

SH

ZI,

L0

02

84

27

4O

FO

LOC

7266

62L

0C72

8662

L0C

7286

62

ZN

F5I

6-

7-

ZN

F51

6-

7.

Zfp

516

.z

-R

GD

1306

817

7-

L0C

4839

30-

7-

ZN

F23

6-

25*

ZN

F23

625

*Z

fp23

6-

asZ

fp23

6-

36*

L0C

4839

29-

30

F0

516F

,G

AL

RI

F0-

518F

,0A

LR

IF

0-M

bp,

Gîr

1F

0M

bp.

Gaf

rlF

0.M

BP,

GA

CR

I

Hum

anC

Iust

er79

.1

bchrf

9

F0

AT

PD8B

3,R

EX

OI

Ch

imp

anze

e

pch

r

)

DF

DF

0A

7PD

883

RE

XO

I

KL

F16

BT

BD

2F

0M

KN

K2,

MO

DK

C2A

Mouse

.3

B-

-

OF

C

mdIr

lO

F0

65,6

853.

Rex

ol

KL

FI6

BT

BD

2F

0M

KN

K2.

MO

BK

UA

Ilat

-3

.

e-

-

DF

0

tch

r7

‘0A

tpdS

b3R

exo?

Kif

16B

tbd2

F0

MFr

,k2,

Mob

kl2a

Po

CC

htl

6

-3.

B--

L0C

6908

20G

D1

5660

94‘0

Mko

k2.

Mob

kI2.

,

DF

0

-3

B-

Hum

anC

luS

ter

79.2

chlm

pan

zee

Mou

seR

atD

09

hchrl

9p

ch

rl9

mch

rlO

rchr7

cch

r2û

F0

SCC

39A

3.80

TA

DF

OF0

.sL

czaA

3.sG

rAD

FO

F0

SOD

DF

OF

08g

OD

FO

F0SL

C39

A.3

S0T

AD

FO

TH

OP

IT

HO

PI

Tso

piT

hopI

TH

OP

I

ZN

F55

4K

7+

ZN

P55

4.

z*

ZN

F55

5K

15+

ZN

F55

5K

15

ZN

F55

6K

g+

ZN

F55

6g

ZN

F57

K13

+Z

NF

57

K13.

ZN

F77

K12

-Z

NP

77K

12

F0

TLEO

F0

TLEO

F0

TleO

F0

0x6

F0

TL

E6

TL

E2

TCE2

T1e2

0e2

TCE2

132

Tad

epal

lyet

Page 151: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

D

Hum

anC

lust

er79

.4

Hum

anC

lust

er79

.3C

him

pan

zee

Mo

use

Rat

Dog

hchrl

9p

ch

rl9

mchrf

lrc

hrl

Qcc

hr2

O

F0

M80

3L2

DF

OF

061

0031

2D

FO

F0-

Mbd

312

DF

O00

Mbd

3I2

DF

0F

0M

803L

2D

FO

ZN

F55

7K

10•

ZN

F55

7K

10•

ZN

F55

7K

10*

ZN

F35

8.

9•

L0C

455655

.g

*Z

NF

358

-9

*

F0

MQ

QL

NI,

PN

PL

AF

F0

MC

OL

N1P

NP

L4F

FGM

coIn

lF

0M

coI1

FG

MC

0CN

I,P

NP

LA

G

01M

153.

PC

P2

01

M1

53

PC

P2

Pcp

2P

cp2

01M

15

3P

CP

2

hch

rl9

Chim

pan

zee

pchrl

9

D

‘0V

AV

IEM

RI

DF

C

EM

R4

NF

557

ZN

F35

8‘0

1.1C

OtN

I,P

NF

MS

04

41

53

3.

XA

B2

Mouse

010

-

-9

*

DF

O‘0

VA

VI.

EM

RI

6MR

4

ZN

F55

7Z

NF3

5B‘0

.M

CO

LNI.

PN

PM

S

KIA

A15

43,

X482

InC

nn

?

FO

Vav

lD

F0

E,,,

4

Rat

K10

+

-9

,

Dog

rcnr

9

FG

Vav

ID

FC

E,1F

4

ccn

r2O

‘GM

coft

ll

Fab

2

F0

Mco

I,1

Xab

2

F0.

VA

VI,

EM

RI

DF

D

EM

R4

ZN

F55

7K

10*

F0.

MC

OLN

I,FN

FL4G

OA

A15

43.

X4B

2

Hu

man

Clu

ster

79.5

Chim

pan

zee

Mo

use

Rat

Dog

hchrl

9p

chrl

9m

chrl

7rc

hr8

Cch

r2û

F0

MA

RC

H2.

HN

RP

.4I

DF

OF

0M

4RC

02,H

NR

FMD

FO

F0.

F5r

pmD

FO

F0.

Hnr

pmD

FO

F0

514R

CH

2,H

1RP

MD

FD

PRA

M1

P00

611

P,4

,n1

Pm

ml

PRA

M1

ZN

F41

4i

.L

0C74

3829

-.

.Z

fp41

4-

2•

Zfp

lOl

K15

L0C

6116

36-

1-

ZN

FS

5BK

9-

L0C

4687

04K

9-

Zfp

lOl

K15

-Z

fp8l

K13

-Z

NF

558

K9

-

ZN

F31

7K

13+

ZN

F31

7K

13-

Zfp

Bl

K13

-

ZN

F69

9K

16-

L0C

744819

K5

-

2NF

559

K11

+Z

NF

177

K7

+

ZN

F17

7K

7+

ZN

F26

6K

14-

ZN

F26

6K

14Z

NF5

6OKO

14

ZN

F56O

KK

14Z

NF

426

K12

-

ZN

426

K12

ZN

F12

1-

10

ZN

F12

1-

10-

ZN

F56

1K

10-

ZN

P5G

1.

w-

L0C

455686

K11

-

ZN

F56

2-

o-

L0

C7

29

64

8-

--

LO

C1

62993

K12

-

F0.

UB

E2C

4F

OX

t:5814

F2

UB

E2L

3,FD

XC

.28

14

F0

Ube

214,

Fb*1

12F

0.U

be21

4Fb

x112

FG

5862

LK

FBX

LI,

58t5

P1

5J1

,OtF

612

PIN

IOL

F612

use,

F,s

lU

NS

,P

-ni

PIN

IOL

F612

133

Tad

epat

lyet

.

Page 152: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

D

Hum

anC

lust

er79

.6C

him

pan

zee

Mou

seR

atD

og

flchrl

9p

ch

rl9

mchr9

rchr8

cchr2

O

FGR

GL

3,PA

KC

SHD

FO

FGR

GL

3,PR

KC

SHD

FO

FGP

,kcs

hD

FQ

FG

Prk

csh

DF

OF

Gt0

C48

4941

DF

D

EL

AV

L3

ELA

VC

3EI

aW3

EIa

43L

0C

484940

ZN

F65

3-

4-

ZN

F65

3-

4-

Zfp

653

-4

-Z

tp65

3-

--

L0C

4849

39-

4

ZN

F62

7K

11*

L0C

4687

23-

--

g5300l5

io7R

ikK

13*

L0C

611075

K15

*

L0C

401898

-6

L0C

4557

35.

--

Ztp

809

K7

*L

0C48

4934

K16

HS

ZF

P36

K16

-H

SZ

FP

36-

16.

BC

0500

92K

11-

L0C

4849

33K

17-

ZN

F441

-19

*L

0C46

8726

.3

-Z

fp8l

OK

12

ZN

F491

-13

*L

0C46

8727

--

ZN

F44O

K12

*Z

NF4

4OK

21*

ZN

F43

9K

Il*

L0C

7468

50-

3*

ZN

F69

K-

*L

0C46

8730

--

-

ZN

F70

0K

21L

0C46

8731

K19

-

ZN

F44

OL

KS

*Z

NF2

O8

K39

ZN

F43

3K

19-

ZN

F13

6-

1+

L0C

729747

K15

ZN

F44

K15

FL

J149

59K

6*

ZN

F44

3K

18-

ZN

F78

8-

16.

L0C

4687

33K

36-

ZN

F2O

K13

-Z

NF

564

K15

-

ZN

F62

5-

8-

ZN

F49O

K13

-

ZN

F13

6K

14+

L0C

4687

35-

16

ZN

F44

K16

-L

0C45

5740

--

-

ZN

F56

3K

8-

ZN

F44

2K

14-

ZN

F79

966

26-

ZN

F44

3K

19-

ZN

F7O

9K

19-

ZN

F56

4K

16-

ZN

F49O

K13

-

ZN

F791

K17

KL

F1.

3-

F0

MA

N2B

1,M

OR

GI

F0

MA

N2B

I.M

OR

GI

F0,

Man

2bl,

Mor

glFG

Ma,

2b1,

Mor

glFG

MA

N2B

I

DH

PS.

FGX

W9

DH

PS

FBX

W9

Dhp

sD

Sps

MO

RG

1

Hum

anC

lust

er79

.7C

him

pan

zee

Mouse

Rat

Dog

hch

rl9

F0

GM

IP

AT

PI3

AI

ZN

F1O

1Z

NF1

4OZ

NF5

O6

L0C

730008

ZN

F25

3Z

NF5

O5

ZN

F68

2Z

NF9

OZ

NF

486

FL

J448

94L

0C

163233

ZN

F62

6Z

NF

66Z

NF

85Z

NF4

3O

ivcn

re

F0

AIp

I3aI

D33

0038

006f

lik

9830

167

H1

8R1k

1200

0031

07R

IKE

G63

6741

A14

491

75010627

rchrl

6

F0

Atp

I31

Gm

,p

MG

C72

612

cch

r2O

DF

0

K10

.

K19

-

K8

-

9*

K3*

K17

*

K11

-

K15

+

K10

.

3.

K14

-

K-

-

-8*

K15

+

K12

*

pchrl

9

FGG

MIP

AT

PI3

AI

ZN

F1O

JZ

NF2

O8

ZN

F93

ZN

F91

ZN

F85

ZN

F43O

L0C

4558

91Z

NF4

31Z

NF

85Z

NF9

1Z

NF9

OZ

NF

429

ZN

F49

2Z

NF

100

ZN

F43

DF

0

K12

-

K7-

K12

-

K10

*

K9-

K13

+

DF

C

K9*

DF

O

K10

*

K36

-

K15

*

K37

*

K12

*

K14

*

K14

-

K12

-

K12

-

134

Tad

epal

lyet

,

Page 153: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

ZN

F71

4Z

NF

431

ZN

F7O

8Z

NF

493

ZN

F42

9Z

NF

100

ZN

F43

ZN

F2O

8Z

NF

257

ZN

F67

6L

0C

14

81

98

L0C

441

843

ZN

F49

2Z

NF

99L

0C

64

68

54

L0C

38

85

23

ZN

F72

4PZ

NF9

1Z

NF

725

ZN

F67

5Z

NF6

81L

0C

646895

L0

C7

30

08

4L

0C

73

00

87

ZN

F2S

4

F0

CT

P25

IU

QC

RF

SI

PO

P4.

PL

EK

HF

I

•12

K12

K15

K-

K17

K12

K22

K34

K12

K15

K13

K28

K30

K7

K35

K14

-16

-6

-6

K4

ZN

F2O

8L

0C46

8804

ZN

F43

2NF

93L

0C74

0583

L0C

4688

06Z

NF

675

L0C

4688

08L

0C74

0901

L0C

4559

07

F0

CT

P25I

,U

OC

RF

SI

PO

P4.

PLE

KH

FI

K36

-

K9

,

K16

-

K34

-

-5—

K14

-

K16

-

-5

*

F0

C1p

25,,

Uqc

rfsl

Pop

4

FGC

5,25

i,U

qcrf

sl

Pop

4

Hum

anC

lust

er79

.8C

him

pan

zee

Mouse

Rat

00

g

hchrl

9p

ch

rl9

mch

r7rc

hrl

achri

FG

PD

CD

2L

UB

A2

DF

OFG

FOC

D2L

,U8A

2O

FO

F0

U5a

2D

FO

F0

Uba

2D

FO

*G

tOC

G1

94

2L

0C

10

45

94

DF

O

WT

IPL

0C

28

44

02

WT

1PL

0C28

4402

Wtip

Wlip

LO

C4

7649

0

L0C

441

847

-9

-L

0C46

8824

--

-L

0C48

4590

K28

-

ZN

F3O

2K

7Z

NF3

O2

--

-L

0C48

4586

K13

ZN

FI8

1K

11Z

NF

181

K11

ZN

F59

9K

14-

L0C

4688

25-

14-

L0C

643825

S-

.Z

NF3

OK

16*

L0C

441

848

-1

-L

0C46

8828

K13

-

ZN

F3O

-18

ZN

F92

-13

-

F0

0RA

MD

1A,

SC

NJB

F0

GR

AM

DIA

,S

CN

IBF

0G

rrn

j1a,

Scn

lbF

0G

ra,r

41a,

Scn

lbF

0S

CN

NIB

HP

NF

XY

D3

HF

NF

X7

03

Hp,1

Hpr

lC

0C48

4585

D

135

Tad

epal

lyet

Page 154: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

D

Hum

anC

lust

er19

.9C

him

pan

zee

Mouse

Rat

Oog

hchrl

9p

ch

rl9

mch

r7re

lui

celu

i

FG

CO

XB

BIU

PK

IAD

FD

F0L

0045

5974

L00

4559

70D

FO

FD

Co

tGb

l,U

kp

laC

kap

lD

FD

FD

Co

x6

bL

Uk

pla

,Ck

apl

DF

OF

DL

0C

012634L

0C

444579

DF

O

CK

AP

I,C

AP

NS

I,C

OX

7AI

LD

C46

8844

Cap

ttl.

Co

x7

a1

Cap

nsl

Cox7a

I

TZ

FP

B2

tL

0C45

5976

K12

-Z

btb3

2B

2-

Zn1

382

K10

+L

0C48

4577

B2

-

ZN

F56

5K

12-

L0C

4559

77K

11-

Z1p

146

-10

-zZ

fp26

O-

13+

L0C

4845

64-

3-

ZN

F14

6-

10t

L0C

4559

78K

23t

593O

415A

O9R

ikK

9R

0D15

6323

9K

33-

L0C

4845

61K

27+

ZF

P14

K13

L0C

4688

46K

23+

Ztp

260

-12

+R

GP

1560

682

K11

+L

0C48

4559

KID

t

ZN

F54

5K

13-

L0C

455979

2K34

-Z

fp56

6K

7-

Zfp

569

K7

L0C

4845

48K

12t

ZN

F5G

67

-L

0C46

8848

K10

+Z

fpB

2K

13-

Z1p

74K

18-

L0C

4845

47-

29-

ZF

P26

O-

13-

L0C

4559

83K

13Z

fpl4

K13

-L

0C49

9120

K39

-L

0C48

4545

K35

-

ZN

F52

9-

ii-

ZN

F2O

SK

39t

Z1p

568

2K11

Ztp

84K

23-

L0C

4845

42K

31t

ZN

F38

2K

10+

ZN

F56

7K

14-

L0C

6254

21K

-L

0C48

4541

K15

+

GIO

T-1

K12

-Z

NF4

61K

12Z

1p74

-1

L0C

4845

40K

22-

ZN

FS

67K

15t

ZN

F38

2K

10-

Zlp

383

-11

L0C

4845

38K

33-

L0

C3

42

89

2K

32-

2NF

529

K11

tZ

fp27

K22

-

MG

C62

IOO

K13

-Z

NF2

6O13

tB

23O

312I

18R

Ik-

+

ZN

F34

S-

15t

ZN

F56

6K

Bt

BC

0273

44K

10

ZN

F56

8K

12t

ZN

F54

5K

13+

6330

581L

23R

ikK

13t

L0

C6

53

28

4-

12+

ZF

P14

K13

+Z

lpSO

K13

t

ZN

F42O

K15

tH

KR

1K

13+

Z1p

84K

11t

ZN

F58

5AK

23-

ZN

F56

9K

39-

ZN

FS

85B

K23

-Z

NF5

71K

17-

ZN

F38

3K

11+

ZFP

3OK

13

HK

R1

K13

+L

0C46

8857

K15

-

ZN

F52

7K

12+

ZN

F6O

7K

20

ZN

F56

9K

18L

0C46

8859

K9

-

ZN

F57O

K11

+

L0C

390927

K5

-

ZN

F54D

K17

-

ZN

F57

IK

17-

ZFP

3OK

13

FJL

3754

9-

3-

ZN

F6O

7K

20-

ZN

FS

73-

19-

L0

C4

01

91

6-

B+

F0

NV

D-S

PI

lSIF

AIL

3F

0L

0c4

6s8

6o

,L0

04

68

86

1F

0493

04

32

EI

IR,k

F0

4930

432E

1IR

ISK

OL

00

46

45

37

DP

FL

PP

FIR

I4A

L0C

4688

62D

p13,

Ppp1

1149

DpO

,Ppplr

l4B

L0C

6128

00,L

00

48

45

36

136

Tad

epal

lyel

Page 155: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

o

oot(uH

N‘9

Page 156: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

Hum

anC

lust

er19

.12

Chim

pan

zee

Mouse

Rat

Dog

mcn

n

F0

2210

4I2

EO5R

1O

L0

0434

161,

0433

BC

0433

01Z

fp71

949

3340

5K07

Rik

rch

nl

FQ22

104

l2E

050ik

L00

434

1010433

RG

D1

3095

64

DF

0

K22

+

K15

+

K11

+

D

DF

D

K11

+

pch

nl9

F0

SIG

LE

C12

,SIG

LE

CPI

1

SI0

LE

0F

SI0

LE

0P

I2

L0C

4562

51Z

NF

649

L0C

4689

81Z

NF

613

ZN

F61

5Z

NF

614

ZN

F43

2Z

NF

616

L0C

4562

61L

0C74

8970

ZN

F48O

L0C

4689

84L

0C46

8985

L0C

4689

88L

0C74

8568

L0C

4564

21L

0C46

8990

ZN

F7O

1L

0C74

8607

ZN

F60

0L

0C45

6267

ZN

F32O

L0C

4562

68Z

NF1

6OL

0C45

6426

L0C

4562

69L

0C45

6270

ZN

F46

8Z

NF3

31

hch

nl9

F0

510L

FC12

,OI0

LE

CF1

I

010L

E06

,SIG

LE

CP

I2

ZN

F17

5Z

NF

577

ZN

F64

9L

0C44

186

1Z

NF

613

ZN

F35O

ZN

F61

5Z

NF

614

ZN

F43

2L

0C28

4371

ZN

F61

6F

LJ1

6287

ZN

F76

6Z

NF4

8OZ

NF6

1OZ

NF

528

ZN

F53

4Z

NF

578

ZN

F8O

8Z

Nfl

O1

ZN

F13

7Z

NF

83L

0C

72

98

40

ZN

F61

1Z

NF

600

ZN

F28

ZN

F46

8Z

NF3

2OL

0C

38

85

59

ZN

F81

6Z

NF7

O2

ZN

F16O

ZN

F41

5Z

NF

347

ZN

F66

S

OF

O

K15

+

K10

-

K16

-

K2+

K17

-

Kil

-

K16

-

K21

-

K25

-

K10

+

K12

+

K17

+

-8

-

-11

+

K24

+

-4

+

•20

-

K12

-

K19

+

K12

+

DF

0

K15

+

K8

-

K10

-

K12

+

K6

K19

-

K11

-

K17

-

K21

-

K25

-

K10

+

K12

+

K9+

K15

+

K17

+

-12

+

-2

+

K9+

-5+

-15

-

K14

-

K17

-

-20

-

-15

-

K11

-

K12

-

-18

-

K15

-

-5

K20

-

K11

-

K20

-

K18

-

ech

ni

FG

L07

6II7

IFL

OC

47O

3S6

DF

C

L006I

l00

0,L

0C

61

1700

L0C

6116

92K

20+

L0C

4843

41K

27+

L0C

491

432

-9

-

L0C

6116

69K

12-

L0C

4843

38K

50-

L0C

4843

33K

12-

L0C

4843

31K

16+

L0C

4763

94K

17-

L0C

4822

73K

13-

L0C

6115

99K

10+

L0C

4807

82K

18+

L0C

6115

90K

4+

L0C

6115

83K

14-

L0C

4843

28K

14+

L0

04

84

32

6K

34+

L0C

4843

24K

17+

L0C

4843

23K

11+

ZN

F331

K12

-

138

Tad

epal

lyet

Page 157: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

w

wowtwI-

g)g)

Page 158: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

ZN

F17

5K

17+

ZN

F41

6K

12-

L0C

6659

130

4+

ZN

F74

9-

17÷

ZN

F211

K22

+Z

tp55

1K

13-

ZN

F77

2K

10-

ZN

F55

1-

--

Zfp

606

K15

+

ZN

F41

9K

11+

L0C

7415

20K

15-

Z1p

329

-12

-

ZN

F77

3K

9+

ZN

F25

6K

15-

Ztp

llO

SKK

ZN

F54

9K

15+

ZN

F6O

6K

16-

Zfp

128

K7

+

ZN

F55O

K8

-L

0C46

9048

s-

+Z

scan

22

58

+

ZN

F41

6K

12-

L0C

4563

40s

4-

Zfp

324

K9

+

ZIK

1K

ZN

FS

29-

12-

Ztp

446

s3

+

ZN

F53O

K13

ZN

F27

4s

+Z

btb4

58

4-

ZN

F13

4-

io+

L0C

4563

44-

--

Mzf

l5

13-

ZN

F21

1K

12+

ZN

F8

K7

+

ZS

CA

N4

54

+L

0C74

2219

ss

+

ZN

F55

1K

16+

ZN

F49

7-

14-

ZN

F1S

4K

10-

L0C

4563

46-

8-

ZN

FG71

K10

-Z

NF

584

K8

+

ZN

fl7

6-

10+

ZN

F13

2-

17-

ZN

F5S

6K

10+

L0C

7426

60K

-+

ZN

FS

52K

s-

ZN

F32

4-

17+

ZN

F5S

7K

13+

ZN

F44

6s

3+

ZN

F81

4-

--

L0C

4563

52-

--

ZN

F41

7K

13-

ZN

F42

s13

-

ZN

F41

8K

16-

ZN

F25

6K

15-

ZN

F6O

6K

16-

ZSC

AN

15

3+

ZN

F13

5K

16+

ZN

F44

75

2-

ZN

F32

9-

12-

ZN

F27

4+

5K0

+

ZN

FS

44K

13+

ZN

F8

K7

+

HK

R2

ss÷

ZN

F49

7-

14-

L0

C1

16

41

2-

6-

ZN

F5B

4K

8+

ZN

F13

2K

16-

ZN

F32

4BK

9+

ZN

F32

4K

9+

ZN

F44

6S

3+

ZN

F49

96

4-

ZN

F42

s13

-

ZN

F93

K17

-

FG

L0C

6537

69F

GL

0C65

3769

FGL

0C65

378

9F

GL

0C65

3759

F5

-L00

653

789

140

Tad

epal

lyet

Page 159: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

D

Hum

anC

lust

er20

.7C

him

pan

zee

Mouse

Rat

jbg

hch

r20

pch

r2o

mch

r2r

chr3

Jcch

r24

F0

SNX

5D

FD

F0

SNX

5D

FO

F0

Sex

5D

FO

F0

Sex

5D

FO

FGS

NX

5D

FO

PTM

AP3

PTM

AP3

Pt,

mp

3FS

nap3

PTM

AP3

ZN

F33

9.

4-

ZN

F33

9.

4-

Ztp

339

.4

Zfp

339

.4

.Z

NF

339

-4

-

ZN

F13

3K

15*

ZN

F13

3K

15*

Zfp

133

K15

*Z

fp13

3K

15

F0

POC

R3F

.R

BB

P9

F0

PDL

R3F

.R

BB

P9

F0

PoIr

3f.

Rbb

p9F

0Po

fr3f

.R

bbpg

F0

FQ

LR

3F,

RB

BP

9

Hum

anC

lust

er2

o.2

hch

r2o

FG

NE

VR

I2R

TS

4D

PL

TP

ZN

F33

5Z

NF

663

ZN

F33

4CG

SLC

I3A

3,T

P53R

K

SLC

2AO

,E

VA

2

FC

.13

-

K14

-

Mou

sem

chr2

G.N

ee42,R

S.

DF

PI

fp335

.13

fp334

K14

C0

Tp5

3rk

51c2

al0,

Eya

2

D

Mou

se

Inch

r2

OH

efaI

c2.

Atp

9aO

FD

3a11

4.

7

fp6

4.

13-

0E

rp2B

pMps

33p4

Rat

rchr3

F0N

evfl

2,R

i*.a

DF

C

PIF Zfp

335

-13

Zfp

334

K14

-

F0

Tp5

3ek

S1c2

a10,

Eya

2

Rat

rchr3

F0

Hef

atc2

,Ap

ga

DF

C

Z1p

64F

0E

,p2B

pMep

s33p

4

-13

Ch

imp

anze

e

pch

r2û

FGN

EV

RU

.R

TSA

PLT

P

ZN

F33

5L

0C74

2834

ZN

F33

4CG

.SL

CI3

A3,

TP53

RK

SLC

2410

,EY

A2

DF

C

.13

-

K14

-

Hum

anC

iust

er20

.3

Dog

hch

,20

cch,2

4

Ch

imp

anze

e

pch

r2o

F0

HN

FAT

C2.

AT

P9A

OF

C

C0N

El,

L2.

RT

SA

DF

D

PL

W

L0C

4859

04.

13

ZN

F33

4K

14

SAL

L4

ZP

P64

GE

RP

28P

,M

RP

S33

P4

F0

HN

FAT

C2.

AW

9AD

FD

.7.

-13

-

C3

SLC

I3A

3.TP

53R

K

SLC

2AIO

,E

YA

2

SAC

L4

ZF

P64

F0,

ER

P2B

F.M

RP

S33

P4

.7.

.13

-

Dog

cch

r24

F0.

HN

FA

TC

2AT

P9A

DF

D

-13

-

L0C

4859

31L

0C48

5932

0,

ER

P2S

P,

MR

PS

33P

4

num

ançI

usr

er1fl

.1n

Imp

an

zee

Mouse

Har

uog

hchr2

lpchr2

lm

ch

rl6

rchrl

lcchr3

l

F0

hMX

l.T

MP

RS

S2

DF

IF

0H

MX

I,T

MP

RS

S2

DF

DF

0H

,nel

,T

,rer

ss2

DF

OF

0H

,,eI,

irlp

rss2

DF

DF

0H

AIX

1,T

MP

RS

S2

DF

O

RIP

K4

RIP

K4

Rip

k4R

Ø3

PIPK

4

PR

DM

155e

14-

PR

DM

15S

e14

-P

rdm

l5S

e14

-P

rdm

l5S

e14

-L

0C61

0905

Se

14-

ZN

F29

5B

6-

ZN

F29

5B

6.

Zfp

295

B6

-Z

fp29

5B

6-

L0C

4877

75B

6-

F0

UM

DD

LI.

AB

C0I

F0

UM

OD

LI4

5C

01

F0

U,r

odll

.Abcg

lF

0U

evd

Il,A

beg

lF

0U

MO

DL

1,A

BC

GI

TFF

3,T

FF

2T

fF3

TFFG

Tff2

Tif

sT

FF3,

WF

2

141

Tad

epal

lyet

,

Page 160: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

DD

Hum

anC

luS

ter

22.1

Ch

imp

anze

eM

ouse

Rat

ag

hch

,22

pch

r22

mch

rlO

rch

rlO

Cchrl

6

F0

UM

OO

LIA

BO

GI

DF

OPC

UM

OD

LI,

AB

CG

ID

FO

F0

U,r

odIl

.Tf

f3D

FO

F0

U,d

i1,

Tff3

DF

OF

GU

MO

DL

1,A

BC

G1

DF

O

ÏFF

3T

FF3

TFF

3

SU

HW

2.

i-

SU

HW

2.

i-

Suhw

2.

1

SUH

W1

-1

.SU

HW

1.

i-

F0

fOL

V2-

34.

CL

V2-

33F

0fO

CV

2-34

.fO

LV

2.33

F0

Pc,m

1211

1F

0P

om

l2ll

lF

0fO

LV

2-34

,1C

CV

2-33

PQ

MI2

1Lf

BC

RL

3P

OM

I2IL

IBC

RL

4B

crl4

-Bc

H4

PO

MI2

1LI

BC

RC

4

Hum

anC

iust

erX

.7C

him

pan

zee

Mouse

Rat

Dog

hch

rXpchrX

mch

rXrc

hrX

cchrX

FOU

BE

1PC

TK

ID

FO

FO

UB

EIP

CT

KI

DF

OF0

.U

2X1.

ftfk

lD

FD

F0U

blx

,PC

lkl

DF

OF0

.UB

EI,

PCT

KI

DF

O

US

PI1

US

PII

Usp

llU

spll

US

PII

ZN

F15

7K

12*

ZN

F15

7K

12•

Ztp

lB2

L0C

6125

09K

12*

ZN

F41

K18

.Z

NF4

1K

18.

D93

0016

NO

4RIk

L0C

4808

99K

18-

ZN

F81

K13

*Z

NF8

1K

13*

L0C

491

863

K13

ZN

F18

26

15-

ZN

F18

2.

15-

L0C

4918

64K

14-

ZN

F63O

K13

-L

0C

473594

--

-L

0C49

1865

K13

F0

SSX

6os

iSSX

2F

032

X6

34,5

2X2

F0

Ssx

alF

05

3x

4I

F0

55X

6,p

s,S

SX

2

C0C

6533

1710

0653

317

Ssx

a2S

s,,a

2L

0OE

533I

7

Hum

anC

lust

erX

.2C

him

pan

zee

Mo

use

Rat

Dog

hch

rXpchrX

mch

rrc

hr

cchr

F0

SPIN

2A.

FAA

H2

DF

OF

0S

PIN

2A.

FA.4

H2

DF

OF

0S

px

2a

DF

OF

0S

pix2

aD

FD

F0

SPIN

2A,

FAA

H2

DF

D

Faa

h2F

aah2

ZX

DB

9Z

XD

B-

9L

0C49

191

2-

9

ZX

DA

-9-Z

XD

A-9

.

FO

KR

Y9P

I7.

FO

KR

T8P

I7,

PC

60

83

17

F0

1608

317

FG

KR

T8P

I7,

LO

C65

3568

L0C

6535

89L

0C65

3596

142

Tad

epal

lyet

Page 161: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

rchrX

F0

CX

rnI4

8D

F0

Hum

anC

k,s

ter

X.3

Ch

!mp

anze

eM

ouse

Rat

Doq

F0

0dx2

6b

hch

rX

F0

CX

044B

.L0C

7284

70

L0C

65

0024

ZN

F75

ZN

F44

9F

0,

00X

265

DF

0

9K5

-

S7*

pch

rX

FGC

X04

48,L

0C72

8470

L0C

6500

24

ZN

F75

ZN

F44

9F

G’0

0X26

B

DF

O

SK5

-

57*

mch

tX

FG

CX

S,6

48D

FC

Zfp

449

S7

*

F0

Ddx

26b

Hu

man

Clu

ster

X.4

hch

rX

F0

AN

MA

GA

,54A

0EA

1

ZN

F27

5L

OC

13

97

35

F0

TR

EX

2,U

CH

L5I

P

DF

0

-11

*

K8,

Ch

imp

aflz

ee

pch

rX

F0

AN

MA

OA

.51A

0E41

DF

C

ZN

F27

5ii

*

L0C

7399

68-

8

FGT

RE

X2,

UC

HC

5IP

Mo

use

mch

rX

F0

An

rga,M

ag

eal

DF

O

Zfp

275

-11

Zfp

92K

9*

F0

T,e*

2,U

chI5

ip

Flat

rchrX

0A

nnga.0

0g1

D

Ztp

275

*0T,

e*2.

Uch

î5ip

F0

11

Dog

Cch

rX

L0C

4922

33

F0.T

RE

X2,

UC

HL

6IP

Hu

man

Clu

ster

Y.1

Ch

fmp

anze

eM

ouse

Rat

Dog

hchrY

pchrY

F0

HT

FF

Y4B

,BP

Y2B

DF

OF

0H

TF

FY

488P

Y2B

DF

O

L0C

392603

--

--

--

L0C

442486

--

--

--

F0.

BP

Y2C

,fl

TY

4C

F0

BPY

2C,

Trr

Y4C

D

Cch

rX

FG

CX

or14

8LO

C72

8470

DF

L0C

6520

21

LO

CB

JI8S

Ose

sL

0C49

2160

sF

000

X26

0

o

DF

O

-19

*

143

Tad

epal

lyet

Page 162: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Supplementary Table S4

Compreliensive catalogue of the C2H2-ZNF genes from the 81 human clusters and

their syntenically homologous clusters from other mammalian genomes (Chimpanzee,

Mouse, Rat and Dog).

For the 81 human (h) C2H2-ZNF clusters and their corresponding syntenically homologous

clusters in chirnpanzee (p), mouse (ni), rat (r) and dog (c), this table provides the cluster

number, the position on the chromosome, the flanking genes (FG), the names ofthe C2H2-

ZNF from the cluster (in bold), the domain associated (D) (K KRAB, S = SCAN, S-K=

SCAN-KRAB, B = BTB, H = HOMEO, Se = SET and ,‘cJ ‘ no domain associated), the

number of zinc finger motifs present (F), the orientation (O).

144

Page 163: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Chapter 3. DISCUSSION

145

Page 164: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Many studies in biology focus on the extensive similarities between the genomes of

human and model organisms, to extract insights into the molecular mechanisms and

aetiology ofhuman diseases. Our investigation ofthe C2H2-ZNF gene family in mammals

reveals that there is an extensive variation ofthe C2H2-ZNF gene content and genomic

organization as well as the domain composition of orthologous genes among species. In

addition, our study is the first to provide a clear demonstration of the important contribution

of gene loss in the evolution of C2H2-ZNf family and to demonstrate the rapid evolution

of C2H2-ZNf genes that occurs between related species, our observations at the genomic

scale provide insights into C2H2-ZNF gene evolution that confirm conclusions drawn from

smaller-scale studies on individual genes, clusters and C2H2-ZNF subfamilies.

The major contributions of our study are:

j. The extensive anaiysis of ail the C2H2-ZNF genes in the human genome.

ii. A comprehensive and systematic anaiysis of ail the human C2H2-ZNF clusters

and the identification of their syntenicaÏiy homoiogous counterparts in other

mammalian genomes.

iii. The distinction of species-specific expansion and Ioss in C2H2-ZNf clusters

and genes in ail mammals.

iv. The identification of variation in the number of zinc finger motifs and the

presence or absence of the conserved N-terminal domains associated with

C2H2-ZNF mammalian orthologs.

146

Page 165: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

y. The tracing back of different evoiutionary patterns of the C2H2-ZNF gene

family within primates and rodents.

vi. The establishment of a mode! reconstructing the history and evolution of the

SCAN, SCAN-KRAB and KRÀB subfamilies.

In brief, our study reveals that the multiple and independent duplications and !osses of

C2H2-ZNf genes and their effector domains within different lineages and species has

shaped and diversified C2H2-ZNf repertoires in mammals.

3.1 The C2H2-ZNF genes in the human genome

Earlier studies of C2H2-ZNF genes focussed on human chromosome 19 (Eichler,

Hoffman et al. 199$; Dehal, Predki et al. 2001; Looman, Abrink et al. 2002; Shannon,

Hamilton et al. 2003) and KRAB C2H2-ZNf subfamily in human (Huntley, Baggott et al.

2006). In contrast, our study provides a comprehensive and systematic analysis of ail the

C2H2-ZNF genes (Supplementary Table S Ï) in the human genome. We identified and

analyzed the organization of 718 C2H2-ZNf genes in the human genome and classified

them into different subfamilies of C2H2-ZNF (KRAB-C2H2-ZNF, SCAN-C2H2-ZNF,

BTB-C2H2-ZNF and those without a conserved N-terminal domain). We also discovered

two new C2H2-ZNF subfamilies, the HOMEO and SET subfamily which have a limited

number ofmembers (5 and 2, respectively) possibly due to a more recent appearance or to a

different rate of duplication and loss.

147

Page 166: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Consistent with previous reports (Rousseau-Merck, Koczan et al. 2002; Huntley,

Baggott et al. 2006), we observed a massive clustering of the C2H2-ZNF on the human

genome. More than 70% of the genes are organized into clusters on the human genome.

However, in addition to the earlier reported clusters on human chromosome 19, (Venter,

Adams et al. 2001; Huntley, Baggott et ai. 2006), we also located a substantial amount of

ciusters (83%) on the other chromosomes ofthe human genome (Supplementary Table S2).

Interestingly, the distribution of C2H2-ZNF genes is positively biased toward chromosome

19, harbouring 40% of ail C2H2-ZNf genes in humans. Most of the human C2H2-ZNF

genes are organized into clusters (500) with more than 60% of these clusters containing

intermixed sets of genes from different subfamilies (Supplementary table S3).

The above observations were only possible through the study of all the C2H2-ZNF

sub-families at the whole genome level.

3.2 Variation in the numbers of C2112-ZNF genes in

mammalian clusters

A systematic and comprehensive analysis of the human C2H2-ZNF clusters and its

syntenic counterparts in the chimpanzee, mouse, rat and dog genomes, allowed us to gain

insights into the evolution of ah the C2H2-ZNF gene subfamilies in mammals

($uppiementaiy Table S4). The criterion to identify homologous clusters in syntenic

regions was based on the flanking genes identified for each human cluster. Interestingly,

this analysis revealed a high variation in the number of C2H2-ZNF genes within

14$

Page 167: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

syntenically homologous clusters of mammals. Considering primates, humans have 518

C2H2-ZNF forming $1 clusters whereas chimpanzee has only 397 C2H2-ZNF organized

into 79 clusters. This suggests that almost ail the C2H2-ZNf clusters in human have a

syntenic counterpart in chimpanzee. However, humans have 30 % more C2H2-ZNf genes

within the identified clusters than chimpanzee implying that C2H2-ZNF genes are evolving

differently within the primate lineage. A similar pattem was observed within the rodent

lineage, where mouse and rat have 232 and 172 C2H2-ZNf organized into 62 and 58

clusters, respectively. A differential expansion of C2H2-ZNF genes, particularly striking in

primates was evident in mammals (human>chimpanzee>mouse>dog>rat for the number of

genes within clusters and human>chimpanzee>mouse>rat>dog for the number of clusters)

(Figure 2 and Supplementary Table S4). A doser look at the individual syntenic clusters in

mouse, rat and dog indicates many cases where dog has more number of genes than rodents

and more specifically than rat (Supplementary Figure S2).

Our study indicates that C2H2-ZNF genes are indeed rapidly evolving genes as

evident for example within the primate and rodents lineages. The differential expansion

observed in the various species may be accounted both by differential duplication and/or

loss. If the high numbers of C2H2-ZNF genes found in human as compared to chimpanzee

suggest an expansion specific to human through tandem duplication, it seems that this

difference is not solely due to duplications but also involves loss in chimpanzee as seen in

many gene families (Fortna, Kim et aI. 2004). In agreement with this, there are more

pseudogenes in chimpanzee clusters (as annotated in Genbank) than in the corresponding

human clusters. Thus, the variation in the numbers of C2H2-ZNF genes observed within

149

Page 168: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

primates could be attributed to both gene duplication and loss due to deletion or

pseudogenization, gain being more predominant in human and loss in chimpanzee.

Interestingly, a variation in the number of C2H2-ZNF genes is also evident in rodents.

However, in this case in almost ail the clusters, mouse has either a higher or equal number

of C2H2-ZNF as compared to rat. Altogether, our study suggests that in addition to lineage

or species-specific increase in the numbers ofthe C2H2-ZNf genes, loss ofthese genes has

also played a very important role in the evolution of this gene family. The evaluation of the

relative contribution of gene duplication and loss requires detailed phylogenetic studies.

3.3 Evolution of C2112-ZNF genes in mammals through

differential expansion and loss

Phylogenetic analyses of human C2H2-ZNF clusters with their syntenic

counterparts from other mammals provided a better estimation of the relative contribution

of gene duplication and loss in the analyzed clusters.

For example, the phylogenetic analysis of the C2H2-ZNF genes from human cluster

Ï 9.12 and its syntenically homologous clusters in mammals combined with the physical

maps of these clusters gives us insights into the gene rearrangement mechanisms that could

have taken place during evolution. Consistent with previous individual reports of lineage

specific expansion, more specifically of KRAB C2H2-ZNF genes (Dehal, Predki et al.

2001; Shannon, Hamilton et al. 2003; Huntley, Baggott et al. 2006), a primate lineage

specific expansion but also a dog specific expansion and a mouse duplication of C2H2-

150

Page 169: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

ZNF genes were clearly identified . In ail species, tandem dupiication was found

responsibie for the species-specific increase in the number of C2H2-ZNF genes as

confirmed by the fact that the genes that group together in the tree are almost aiways

physicaiïy clustered together in the cluster on the chromosome. Furthermore, the

orientations of the genes belonging to the same ciade in the phylogenetic tree and their

orthoiogs were almost aiways the same. In a few instances, however, the orientations of the

genes belonging to the same phyiogenetic clade were different and genes within syntenic

clusters were inverted in a few instances, revealing a lot ofpossibie gene rearrangements.

Clear evidence of gene loss was aiso obtained by analyzing cluster 19.12 and other clusters.

Considering that rodents are evolutionariiy more related to primates than dog, an absence in

rodents of C2H2-ZNF ciusters or genes, that are present in primates and dog, suggests a

loss in rodents. Severai examples of genes loss in rodents were obtained. The phylogenetic

analysis also indicated loss of genes in chimpanzee by pseudogenization as suggested

above.

Aitogether our studies indicate a predominant role of gene gain by tandem duplication over

gene loss for the evolution of C2H2-ZNF genes in mammals. It should however be pointed

out that more definitive conclusions about the pseudogene status of the various C2H2-ZNF

genes and on the role of these genes requires detaiied functional investigation of the

individual genes.

151

Page 170: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

3.4 Evolutïon of the C2H2-ZNF genes tlirougli duplication or

loss of zinc linger and N-terminal effector motifs

In accordance with a previous study on the average number of zinc finger motifs

from a few plant (1), yeast(1.5), nematode (2.5), insect (3.5) and human (8) C2H2-ZNF

(Venter, Adams et al. 2001), an in depth analysis of the zinc finger motifs associated with

the C2H2-ZNf genes found in clusters in human and their syntenic genes in chimpanzee,

mouse, rat and dog indicated that there is a significant variation in the number of zinc

finger motifs associated with C2H2-ZNF genes in these mammalian species. Noticeably,

the C2H2-ZNF genes from dog were found to encode a higher average of zinc finger motifs

as compared to the other mammals studied. It is possible that an increase in the number of

fingers within genes could confer advantageous additional functionality to the C2H2-ZNf

genes through a diversification ofthe possible nucleic acid and protein interactions.

We also observed a variation in the presence of N-terminal effector motifs, such as

SCAN or KRAB among orthologs, accounted by either gain or loss of these motifs.

However, loss of N-terminal effector domains by sequence degeneration was confirmed in

several cases in our study. A thorough analysis of the exon-intron structure of C2H2-ZNF

genes indicated a typical conserved exon-intron organization for C2H2-ZNF genes

associated with a SCAN-KRAB. Based on these observations, we propose a model of

evolution of C2H2-ZNF sub-families involving independent gain events of a SCAN and

KRAB domain each by an exon-shuffling mechanism and subsequent gene duplications

and loss of effector motifs by deletion or degeneration of the SCAN and/or KRAB domain.

152

Page 171: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

3.5 Bïrth ami Deatli model of evolution

It was suggested from a study of a few chromosome 19 C2H2-ZNF clusters that

C2H2-ZNF genes evolve by positive selection (Schmidt and Durrett 2004). Our resuits and

analyses of the human C2H2-ZNF clusters and their syntenic counterparts in other

mammals suggests a “Birth and Death” model of evolution similar to that proposed by Nei

and coli. (Nei, Gu et al. 1997; Nei 2000) (See Figure 7B). According to this mode!, new

genes are created by duplication inc!uding tandem duplication and b!ock gene duplication

(birth). While some ofthem might acquire a new function and thus diverge functiona!ly, the

others may remain relatively unchanged in the genome for a long time. Again others

become pseudogenes following deleterious mutations or get de!eted from the genome

(death through inactivation or elimination). Though functiona! information is known for

on!y a handful of C2H2-ZNF proteins, the variations in the numbers C2H2-ZNF genes that

we found throughout the evolution of mamma!s and our phy!ogenetic analysis points to

duplication and loss as a guiding force in the evolution ofthese genes.

153

Page 172: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Thue

Ancestralspccics

Figure 7: Birth-and-death model of evolution.

The figure shows the two models associated with the evolution of multigene farnilies.

Open circles represent functional genes and closed circles represent pseudogenes.

(A) In concerted evolution, related genes belonging to the ancestral species evolve in a

concerted manner rather than independently in both Species 1 and Species 2.

(B) Birth and Death Mode! of evolution, where the genes evolve differently by duplication,

few are maintained in the genome for longer, while the others are deleted or become

pseudogenes.

Speces I Specics 2 Species E Species 2

fA) Ccrncerted evtiIutkr (13) Biith-and-death mx1e1of evolutior

154

Page 173: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

3.6 C2H2-ZNF gene family: An analogy with the Olfactory Receptor

gene family

The olfactory receptor genes constitute the largest mammalian gene family with

more than 1000 members in human. However, 60% of these genes are pseudogenes. In

contrast to this, the olfactory receptor gene family in mouse comprises of roughly the same

number of genes as human, though the number of pseudogenes is only 20% (Glusman,

Yanai et al. 2001; Niimura and Nei 2003; Niimura and Nei 2005). Comparative analyses of

this gene family in human, mouse and non-human primates have revealed that differential

expansion and loss have guided the evolution of this gene family (Sharon, Glusman et al.

1999; Lapidot, Pupe! et al. 2001). However, human counterparts have accumulated a lot of

mutations, leading to the numerous pseudogenes in comparison to mouse or any other non

human primate. This is associated with the reduced chemosensory capacity in humans.

The olfactory receptor and the C2H2-ZNF gene families show similar pattems of

evolution. Differential gene expansion and loss have played an important role in the

evolution of both gene families in mammals. However, in contrast to the olfactory receptor

genes, C2H2-ZNF genes apparently do not accumulate pseudogenes in humans irrespective

of their large number. Studies on human C2H2-ZNF clusters have indicated that these

genes are rapidly evolving through positive selection and may acquire new functions after

duplication (Schmidt and Durrett 2004; Huntley, Baggott et al. 2006).

By making a correlation between the number of olfactory genes in human and

mouse and their functions in the respective organisms, we do understand that a reduced

chemosensory dependence in primates and non-human primates as compared to mouse can

155

Page 174: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

be responsible for the large number of pseudogenes in human. Presently, the lack of large

scale analysis of the expression profile and ftmction of C2H2-ZNF genes preclude the

establishment of such type of conclusions. The extremely high number of hurnan C2H2-

ZNf genes, the species-specific expansions and loss in mammals leading to differential

evolution within primates and rodents, when put in perspective with functional information

ofthese proteins could give us interesting insights into the evolution ofthis gene family.

3.7 A few concerns to the study

We must acknowledge here three possible sources ofbias in our study.

First, errors in reporting the number of genes due to improper sequencing and

annotation in the available databases could be a primary source of bias to our study.

However, we did only consider genomes like human, chimpanzee, mouse, rat and dog

which are >94% complete to minimize significantly this source ofbias.

A second concern is that the loss of C2H2-ZNf clusters or genes that we see among

the species could be due to the fact that the genes were dispersed onto different

chromosomes due to translocation. Though we do accept this as a possibility, we conduct

an extensive analysis to rule out this possibility for the clusters we smdied in depth. For

example, in group I of the phylogenetic tree (Figure 5; Article) of the human cluster 19.12

and its syntenic clusters in chimpanzee, mouse, rat and dog, we observe three orthologs

(hZNF331, pZNF33Y and cZNF331) from human, chimpanzee and dog. There is no rodent

counterpart for these genes, which suggests a Ioss in rodents. To rule out the possibility that

these genes were dispersed onto other regions of the genome by transiocation, we conduct

156

Page 175: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

an extensive homology TBLASTN search of the mouse and rat genomes using each of the

three orthoÏogs from human, chirnpanzee and dog as a query. for the three queries, the top

most blast hit was Zfpl4 from mouse and L0C97 124 from rat. We included these two

sequences into the dataset used for the phylogenetic analysis (see Methods; Article) of

cluster 19.12 in human and its syntenic counterparts in chimpanzee, mouse, rat and dog.

The phylogenetic tree revealed that the two mouse and rat sequences group with the three

orthoÏogs from human, chimpanzee and dog in group I. However, a doser look at the

sequence similarity between these sequences (< 60%) suggests that they cannot be

orthologs and the grouping we see could possibly be because ofthe fact that they were the

closest to the query sequences used

The third and final concem is that considering the extremely large numbers of

C2H2-ZNF in the human genome, we cannot rule out the possibility of pseudogenes.

Though we do not conduct an extensive search to look for possible pseudogenes, an

analysis of the open reading frames of the C2H2-ZNF genes considered in this study with

their translated sequences suggest that most of them are rnost likely flot pseudogenes. A

distribution curve (Figure 8) of the amino acid sequence length of the various C2H2-ZNF

genes shows that almost ail of the sequences have large open reading frames potentially

translated and functional.

157

Page 176: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

160

—e-— Noof C2H2-ZNF

Figure 8: Plot ofthe amino acid sequence iengths of ail the C2H2-ZNF in the human

genome

140

120

100

80-

60

40

20

o - -- — - -- ——

Length of the C2H2-ZNF amino acid sequence

158

Page 177: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

3.8 Merits of the study

Our study provides a comprehensive insight into the evolution of C2H2-ZNF

throughout several mammalian genomes. To summarize, the merits of our study are as

follows.

• A good range of species, with completely sequenced genomes was considered to

analyze the evolution of C2H2-ZNF genes in mammals.

• A stringent phylogenetic analysis of the syntenic clusters in human, chimpanzee,

mouse, rat and dog was performed using both maximum likelihood (RAxML)

and bayesian (Mr.Bayes) methods. Noticeably, unlike other studies on C2H2-

ZNF which use Xfin as an outgroup (Looman, Abrink et al. 2002; Shannon,

Hamilton et al. 2003; Huntley, Baggott et al. 2006), we conduct an extensive

search to include chicken (Gallus gallus) homologs in addition to Xfin as an

outgroup. The chicken sequences are considerably doser as an outgroup to the

species studied (human, chimpanzee, mouse, rat and dog).

The phylogenetic relationships we observed between C2H2-ZNF genes in the

syntenic clusters from the different species was found consistent with the overail

picture of the number of genes in the species and with the physical maps of the

clusters.

• A model for the evolutionary relationship of SCAN, SCAN-KRAB and KRAB

C2H2-ZNF subfamilies is proposed providing a possible explanation for

previously unresolved questions in the field.

159

Page 178: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

3.9 Perspectives

The following studies could be done as a future approacli. to what is already

known.

• The compilation of comprehensive catalogues of the C2H2-ZNF gene

repertoires in chimpanzee, mouse, rat and dog. The detailed comparison of

the organization and numbers of C2H2-ZNF in complete repertoires.

• The stringent phylogenetic analysis of these repertoires to gain insights into

the various detailed mechanisms, which have taken place during the

evolution ofthis gene family in mammals.

• The more detailed analysis of the physical mapping of genes within clusters

to gain insight into the molecular mechanisms involved in the expansion of

these genes. This could include the analysis of the possible repeated

sequences that are bordering these C2H2-ZNF and that may be involved in

this phenomenon, the analysis of the orientation, distances between genes

and exon-intron organisation.

• The more comprehensive study of pseudogenes

Clearly, more detailed bio-informatics and functional studies are stiil required for a better

understanding of the driving force behind the expansion of C2H2-ZNF genes in mammals.

160

Page 179: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

REFERENCES

Bellefroid, E. J., P. J. Lecocq, et al. (1989). “The human genorne contains hundreds of

genes coding for finger proteins ofthe Kruppel type.” DNA 8(6): 377-87.

Bellefroid, E. J., J. C. Marine, et al. (1993). “Clustered organization ofhomologous KRAB

zinc-finger genes with enhanced expression in human T lymphoid celis.” Embo J

12(4): 1363-74.

Bellefroid, E. J., D. A. Poncelet, et al. (1991). “The evolutionarily conserved Kruppel

associated box domain defines a subfarnily of eukaryotic multifingered proteins.”

Proc Nati Acad Sci U $ A 88(9): 3 608-12.

Benn, A., M. Antoine, et aÏ. (1991). “Primary structure and expression ofa chicken cDNA

encoding a protein with zinc-finger motifs.” Gene 106(2): 207-12.

Bertrand, D. and O. Gascuel (2005). “Topological rearrangements and local search method

for tandem duplication trees.” IEEE/ACM Trans Comput Biol Bioinform 2(1): 15-

28.

Birtie, Z. and C. P. Ponting (2006). “Meisetz and the birth ofthe KRÀB motif.”

Bioinformatics 22(23): 2841-5.

Bouhouche, N., M. $yvanen, et al. (2000). “The origin ofprokaryotic C2H2 zinc finger

regulators.” Trends Microbiol 8(2): 77-8 Ï.

Castresana, J. (2000). “Selection of consewed blocks from multiple alignments for their use

in phylogenetic analysis.” Mol Biol Evol 17(4): 540-52.

Chung, H. R., U. Schafer, et al. (2002). “Genomic expansion and clustering of ZAD

containing C2H2 zinc-finger genes in Drosophila.” EMBO Rep 3(12): 1158-62.

Collins, T., J. R. Stone, et aI. (2001). “AIl in the family: the BTBIPOZ, KRAB, and SCAN

domains.” Mol Cell Biol 21(11): 3609-15.

Darwin, C. (1837). The First Notebook on Transmutation of Species.

DehaÏ, P., P. Predki, et aÏ. (2001). “Hurnan chromosome 19 and related regions in mouse:

conservative andiineage-specific evolution.” Science 293(5527): 104-11.

161

Page 180: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Dernuth, J. P., T. D. Bie, et al. (2006). “The evolution of Mammalian gene families.”

PLoS ONE 1: e85.

Edeistein, L. C. and T. Collins (2005). ‘The SCAN domain family of zinc finger

transcription factors.” Gene 359: 1-17.

Edgar, R. C. (2004). “MUSCLE: multiple sequence alignment with high accuracy and high

throughput.” Nucleic Acids Res 32(5): 1792-7.

Eichler, E. E., S. M. Hoffman, et al. (1998). “Complex beta-satellite repeat stmctures and

the expansion of the zinc finger gene cluster in l9p12.” Genome Res 8(8): 791-80$.

Elernento, O. and O. Gascuel (2002). “An efficient and accurate distance based algorithm to

reconstruct tandem duplication trees.” Bioinformatics 18 Suppi 2: S92-9.

EÏemento, O., O. Gascuel, et al. (2002). “Reconstructing the duplication history of

tandemly repeated genes.” Mol Biol Evol 19(3): 27$-8$.

fitch, W. M. (2000). “Homology a personal view on some ofthe problems.” Trends Genet

16(5): 227-3 1.

Fortua, A., Y. Kim, et aÏ. (2004). “Lineage-specific gene duplication and loss in human and

great ape evolution.” PLoS Biol 2(7): E207.

Francis Darwin, A. C. S. (1903). More letters of Charles Darwin: A record ofhis work in a

series ofhitherto unpublished letters.

Frankel, A. D., J. M. Berg, et al. (1987). “Metal-dependent folding ofa single zinc finger

from transcription factor IIIA.u Proc Nati Acad Sci U S A 84(14): 4841-5.

Friedman, J. R., W. J. Fredericks, et al. (1996). “KAP-l, a novel corepressor for the highly

conserved KRÀB repression domain.” Genes Dey 10(16): 2067-78.

Fuchs, T., G. Glusman, et al. (2001). “The human olfactory subgenome: from sequence to

structure and evolution.” Hum Genet 102(1): 1-13.

Germain-Desprez, D., M. Bazinet, et al. (2003). “Oligomerization oftranscriptional

intermediary factor 1 regulators and interaction with ZNf74 nuclear matrix protein

revealed by bioÏuminescence resonance energy transfer in living celis.” J Biol Chem

278(25): 22367-73.

162

Page 181: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Gertz, E. M., Y. K. Yu, et al. (2006). “Composition-based statistics and translated

nucleotide searches: improving the TBLASTN module ofBLAST.” BMC Biol 4:

41.

Gilad, Y., O. Man, et al. (2005). ‘A comparison of the human and chimpanzee olfactory

receptor gene repertoires.” Genome Res 15(2): 224-30.

Glusman, G., A. Bahar, et al. (2000). “The olfactory receptor gene superfamily: data

mining, classification, and nomenclature.” Mamrn Genome 11(1 1): 1016-23.

Glusman, G., I. Yanai, et al. (2001). “The complete human olfactory subgenome.” Genome

Res 11(5): 685-702.

Grondin, B., M. Bazinet, et al. (1996). “The KRAB zinc finger gene ZNF74 encodes an

RNA-binding protein tightly associated with the nuclear matrix.” J Biol Chem

271(26): 15458-67.

Hamilton, A. T., S. Huntley, et al. (2003). “Lineage-specific expansion ofKRAB zinc

finger transcription factor genes: implications for the evolution of vertebrate

regulatory networks.” Cold Spring Harb Symp Quant Biol 6$: 13 1-40.

Hamilton, A. T., S. Huntley, et al. (2006). “Evolutionary expansion and divergence in the

ZNf9Y subfamily ofprimate-specific zinc finger genes.” Genome Res 16(5): 584-

94.

Huelsenbeck, J. P. and F. Ronquist (2001). “MRBAYES: Bayesian inference of

phylogenetic trees.” Bioinformatics 17(8): 754-5.

Huntley, S., D. M. Baggott, et al. (2006). “A comprehensive catalog ofhuman KRAB

associated zinc finger genes: insights into the evolutionary history of a large family

of transcriptional repressors.” Genome Res 16(5): 669-77.

Kim, S. S., Y. M. Chen, et al. (1996). “A novel member of the RTNG finger family, KRTP

1, associates with the KRAB-A transcriptional repressor domain of zinc finger

proteins.” Proc NatÏ Acad Sci U S A 93(26): 15299-304.

Klug, A. and D. Rhodes (1987). “Zinc fingers: a novel protein fold for nucleic acid

recognition.” Cold Spring Harb Symp Quant Biol 52: 473-82.

163

Page 182: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Krebs, C. J., L. K. Larkins, et aÏ. (2003). “Regulator ofsex-limitation (Rsl) encodes a pair

ofKRAB zinc-finger genes that control sexually dimorphic liver gene expression.”

Genes Dey 17(2 1): 2664-74.

Krishna, S. S., I. Majumdar, et al. (2003). “Structural classification of zinc fingers: survey

and summary.” Nucleic Acids Res 31(2): 532-50.

Lander, E. S., L. M. Linton, et al. (2001). “Initial sequencing and analysis ofthe human

genome.” Nature 409(6822): 860-921.

Lapidot, M., Y. Pilpel, et al. (2001). “Mouse-human orthology relationships in an olfactory

receptor gene cluster.” Genomics 71(3): 296-306.

Lee, M. S., G. P. Gippert, et al. (1989). “Three-dimensional solution structure ofa single

zinc finger DNA-binding domain.” Science 245(4918): 635-7.

Li, W., L. Jaroszewski, et al. (2001). “Clustering ofhighly homologous sequences to reduce

the size of large protein databases.” Bioinformatics 17(3): 282-3.

Looman, C., M. Abrink, et al. (2002). “KRAB zinc finger proteins: an analysis ofthe

molecular rnechanisrns governing their increase in numbers and complexity during

evolution.” Mol Biol Evol 19(12): 2118-30.

Looman, C., L. Hellman, et al. (2004). “A novel Kruppel-Associated Box identified in a

panel ofmamrnalian zinc finger proteins.” Mamm Genome 15(1): 35-40.

Margolin, J. f., J. R. Friedman, et al. (1994). “Kruppel-associated boxes are potent

transcriptional repression domains.” Proc Natl Acad Sci U S A 91(10): 4509-13.

Mark, C., M. Abrink, et al. (1999). ‘Comparative analysis ofKRAB zinc finger proteins in

rodents and man: evidence for several evolutionarily distinct subfamilies of KRAB

zinc finger genes.” DNA Ceil Biol 18(5): 381-96.

Melnick, A., G. Carlile, et al. (2002). “Critical residues within the BTB domain ofPLZF

and 3d-6 modulate interaction with corepressors.” Mol Celi Biol 22(6): 1804-18.

Messina, D. N., J. Glasscock, et al. (2004). “An ORFeome-based analysis ofhuman

transcription factor genes and the construction of a microarray to interrogate their

expression.” Genorne Res 14(1OB): 204 1-7.

164

Page 183: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Miller, J., A. D. McLachlan, et al. (1985). “Repetitive zinc-binding domains in the protein

transcription factor TuA from Xenopus oocytes.” Embo J 4(6): 1609-14.

Moosmann, P., O. Georgiev, et al. (1996). “Transcriptional repression by RING finger

protein TIF1 beta that interacts with the KRAB repressor domain of KOXI.’

Nucleic Acids Res 24(24): 4859-67.

Moreira, D. and F. Rodriguez-Valera (2000). “A mitochondrial origin for eukaiyotic C2H2

zinc finger regulators?” Trends Microbiol 8(10): 448-50.

Nei, M., and Kumar S. (2000). Molecular Evolution and Phylogenetics, New York: Oxford

University Press.

Nei, M., X. Gu, et al. (1997). “Evolution by the birth-and-death process in multigene

families ofthe vertebrate immune system.” Proc Nati Acad Sci U S A 94(15): 7799-

806.

Niimura, Y. and M. Nei (2003). “Evolution of olfactory receptor genes in the human

genome.” Proc Nati Acad Sci U S A 100(21): 12235-40.

Niimura, Y. and M. Nei (2005). “Comparative evolutionary analysis of olfactoiy receptor

gene clusters between humans and mice.” Gene 346: 13-2 1.

Ohta, T. (2000). “Evolution of gene families.” Gene 259(1-2): 45-52.

Omichinski, J. G., G. M. Clore, et al. (1990). High-resolution three-dimensional structure

of a single zinc finger from a human enhancer binding protein in solution.

Biochemistry. 29: 9324-34.

Omichinski, J. G., G. M. Clore, et al. (1992). “High-resolution solution structure ofthe

double Cys2His2 zinc finger from the human enhancer binding protein MBP-1

Biochemistry 31(16): 3907-17.

Pabo, C. O. and R. T. $auer (1992). Transcription factors: structural families and principles

ofDNA recognition. Annu Rev Biochem. 61: 1053-95.

Page, R. D. and M. A. Charleston (1997). “From gene to organismal phylogeny: reconciled

trees and the gene tree/species tree problem.” Mol Phylogenet Evol 7(2): 231-40.

Parraga, G., S. J. Horvath, et al. (1988). “Zinc-dependent structure ofa single-finger

domain ofyeast ADR1.” Science 241(4872): 1489-92.

165

Page 184: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Pengue, G., V. Calabro, et al. (1994). “Repression oftranscriptional activity at a distance

by the evolutionarily conserved KRAB domain present in a subfamily of zinc finger

proteins.” Nucleic Acids Res 22(15): 2908-14.

Pengue, G. and L. Lania (1996). “Kruppel-associated box-mediated repression ofRNA

polymerase II promoters is influenced by the arrangement of basal promoter

elements.” Proc Nati Acad Sci U S A 93(3): 1015-20.

Quignon, P., E. Kirkness, et al. (2003). “Comparison ofthe canine and human olfactory

receptor gene repertoires.” Genome Biol 4(12): R80.

Rhodes, D. and A. Klug (1993). “Zinc fingers.” Sci Am 268(2): 56-9, 62-5.

Roberts, S. G. (2000). “Mechanisms of action of transcription activation and repression

domains.” Celi Mol Life Sci 57(8-9): 1149-60.

Rosati, M., M. Marino, et al. (1991). “Members of the zinc finger protein gene family

sharing a conserved N-terminal module.’ Nucleic Acids Res 19(20): 566 1-7.

Rousseau-Merck, M. F., D. Koczan, et al. (2002). “The KOX zinc finger genes: genome

wide mapping of 368 ZNF PAC clones with zinc finger gene clusters predominantly

in 23 chromosomal loci are confirmed by human sequences annotated in

EnsEMBL.” Cytogenet Genome Res 98(2-3): 147-53.

Ruiz j Altaba, A., H. Peny-O’Keefe, et al. (1987). “Xfin: an embryonic gene encoding a

multifingered protein in Xenopus.” Embo J 6(10): 3065-70.

Sander, T. L., A. L. Haas, et al. (2000). “Identification ofa novel SCAN box-related protein

that interacts with MZF lB. The leucine-rich SCAN box mediates hetero- and

homoprotein associations.” J Biol Chem 275(17): 12857-67.

Schmidt, D. and R. Durrett (2004). “Adaptive evolution drives the diversification of zinc

finger binding domains.” Mol Biol Evol 21(12): 2326-39.

Schuh, R., W. Aicher, et al. (1986). “A conserved family ofnuclear proteins containing

structural elements ofthe finger protein encoded by Kruppel, a Drosophila

segmentation gene.” Ceil 47(6): 1025-32.

Schumacher, C., H. Wang, et al. (2000). “The SCAN domain mediates selective

oligomerization.” J Biol Chem 275(22): 17 173-9.

166

Page 185: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Shannon, M., A. T. Hamilton, et al. (2003). 11Differential expansion 0f zinc-finger

transcription factor loci in homologous human and mouse gene clusters.” Genome

Res 13(6A): 1097-110.

Shannon, M., J. Kim, et al. (199$). “Tandem zinc-finger gene farnilies in mammals:

insights and unanswered questions.” DNA Seg 8(5): 303-15.

Sharon, D., G. Glusman, et al. (1999). “Primate evolution of an olfactory receptor cluster:

diversification by gene conversion and recent emergence ofpseudogenes.”

Genomics 61(1): 24-36.

Sitnikova, T. and C. Su (199$). “Coevolution of immunoglobulin heavy- and light-chain

variable-region gene families.” Mol Biol Evol 15(6): 6 17-25.

Stamatakis, A., T. Ludwig, et al. (2005). “RÀxML-III: a fast program for maximum

likelihood-based inference of large phylogenetic trees.” Bioinformatics 21(4): 456-

63.

Stone, J. R., J. L. Maki, et al. (2002). “The SCAN domain of ZNF 174 is a dimer.” J Biol

Chem 277(7): 5448-52.

Tang, M., M. Waterman, et al. (2002). “Zinc finger gene clusters and tandem gene

duplication.” J Comput Biol 9(2): 429-46.

Theunissen, O., F. Rudt, et al. (1992). “RNA and DNA binding zinc fingers in Xenopus

TFIIIA.” CeIl 71(4): 679-90.

Thomton, J. W. and R. DeSalle (2000). “Gene family evolution and homology: genomics

meets phylogenetics.” Annu Rev Genomics Hum Genet 1: 41-73.

Urrutia, R. (2003). “KRAB-containing zinc-finger repressor proteins.” Genome Biol 4(10):

231.

Venter, J. C., M. D. Adams, et al. (2001). “The sequence of the human genome.” Science

291(5507): 1304-51.

Waterston, R. H., K. Lindblad-Toh, et al. (2002). “Initial sequencing and comparative

analysis ofthe mouse genome.” Nature 420(69 15): 520-62.

167

Page 186: Université de Montréal 'Evo1ution ofC2H2-Zinc finger genes ...

Williams, A. J., L. M. Khachigian, et al. (1995). ‘Isolation and characterization ofa novel

zinc-finger protein with transcription repressor activity.” J Biol Chem 270(3 8):

22143-52.

Witzgall, R., E. O’Leary, et al. (1994). “The Kruppel-associated box-A (KRAB-A) domain

of zinc finger proteins mediates transcriptional repression.” Proc Nati Acad Sci U S

A 91(10): 4514-8.

Wolfe, S. A., L. Nekiudova, et al. (2000). “DNA recognition by Cys2His2 zinc finger

proteins.” Annu Rev Biophys Biomol $truct 29: 183-2 12.

Zheng, L., H. Pan, et al. (2000). “Sequence-specific transcriptional corepressor function for

BRCA1 through a novel zinc finger protein, ZBRK1 .“ Mol CelI 6(4): 75 7-68.

168