Conservation of the homeodomain sequence of the Mixl1 homeobox gene in multiple species Nhi Hin ABSTRACT The Mix family of paired-like homeobox genes are highly conserved throughout evolution due to their vital roles in the formation and specification of mesoderm and endoderm during vertebrate gastrulation. The homeodomain motif is essential for the DNA-binding activity of the transcription factor products of these genes and is shown to be highly conserved amongst a diverse range of species despite some variation in homeobox nucleotide sequence. In the present study, the Mix1 homeodomain sequences in the mouse, zebrafish and platypus are isolated, sequenced and compared to gain insight into their evolutionary history. The Mix1 homeobox sequences determined are consistent with the reported literature. Despite some variation in nucleotide sequence, active-site amino acid residues in the N-terminal arm and recognition helix were found to be particularly conserved throughout all species while nucleotides corresponding to non- active site amino acids displayed greater variation. INTRODUCTION Homeobox (Hox) genes are crucial in the pattern formation of many vertebrates during embryogenesis. The genes are clustered in the genome and encode transcription factors called homeoproteins that specify segmental identity and positional information along the anterior-posterior axis. The organisation of Hox genes in the chromosome corresponds to the order of their spatial and temporal expression along the anterior- posterior body axis, a phenomenon referred to as “collinearity” (Fig.1A). Additionally, Hox genes contain a highly conserved DNA sequence known as the homeobox which encodes a 60-amino acid protein structure called the homeodomain (Fig.1B). The homeodomain has a helix-turn-helix motif that allows homeoproteins to bind to specific DNA sequences; the N-terminal arm contacts the minor DNA groove while the third helix interacts with the major DNA groove (Fig. 1C) (Burke et al. 1995; Gehring et al. 1994). The Mix family of paired-like homeobox genes has been highly conserved throughout vertebrate development and is involved in the establishment and specification of mesoderm and endoderm germ layers during gastrulation (Pereira et al. 2012). Mesoderm-Inducing-Factor Inducible Homeobox (Mix1) is predominantly expressed at the sites of future endoderm and mesoderm development in the blastula in many species including mice and humans (Pereira et al. 2012). Zebrafish appear to have multiple partially redundant Mix genes, including the specific paired-like-homeobox gene called Mxtx1, which is expressed in an analogous location to the primitive endoderm in mammalian embryos (Hirata et al. 2000). The conservation of Mix-family genes across many species make them useful for investigating the evolutionary history of these species. In the present study, sequences corresponding to the Mixl1 gene in mice and platypus and Mxtx1gene in zebrafish will be extracted and sequenced. The gene structures for mouse and platypus Mixl1 along with zebrafish Mxtx1 are shown in Figure 2. Note the homeobox sequences are separated by a variable intrionic sequence in each species. Primers have been designed to extract sequences containing the homeobox regions. The extent of conservation of nucleotide sequence and amino acid sequence will be determined and the evolutionary history of these species briefly discussed.
14
Embed
Conservation of the Mixl1 homeobox in multiple species
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Conservation of the homeodomain sequence of
the Mixl1 homeobox gene in multiple species
Nhi Hin
ABSTRACT
The Mix family of paired-like homeobox genes are highly conserved throughout evolution due to their vital
roles in the formation and specification of mesoderm and endoderm during vertebrate gastrulation. The
homeodomain motif is essential for the DNA-binding activity of the transcription factor products of these
genes and is shown to be highly conserved amongst a diverse range of species despite some variation in
homeobox nucleotide sequence. In the present study, the Mix1 homeodomain sequences in the mouse,
zebrafish and platypus are isolated, sequenced and compared to gain insight into their evolutionary history.
The Mix1 homeobox sequences determined are consistent with the reported literature. Despite some
variation in nucleotide sequence, active-site amino acid residues in the N-terminal arm and recognition helix
were found to be particularly conserved throughout all species while nucleotides corresponding to non-
active site amino acids displayed greater variation.
INTRODUCTION
Homeobox (Hox) genes are crucial in the pattern
formation of many vertebrates during
embryogenesis. The genes are clustered in the
genome and encode transcription factors called
homeoproteins that specify segmental identity and
positional information along the anterior-posterior
axis. The organisation of Hox genes in the
chromosome corresponds to the order of their
spatial and temporal expression along the anterior-
posterior body axis, a phenomenon referred to as
“collinearity” (Fig.1A). Additionally, Hox genes
contain a highly conserved DNA sequence known as
the homeobox which encodes a 60-amino acid
protein structure called the homeodomain (Fig.1B).
The homeodomain has a helix-turn-helix motif that
allows homeoproteins to bind to specific DNA
sequences; the N-terminal arm contacts the minor
DNA groove while the third helix interacts with the
major DNA groove (Fig. 1C) (Burke et al. 1995;
Gehring et al. 1994).
The Mix family of paired-like homeobox genes has
been highly conserved throughout vertebrate
development and is involved in the establishment
and specification of mesoderm and endoderm germ
layers during gastrulation (Pereira et al. 2012).
Mesoderm-Inducing-Factor Inducible Homeobox
(Mix1) is predominantly expressed at the sites of
future endoderm and mesoderm development in the
blastula in many species including mice and humans
(Pereira et al. 2012). Zebrafish appear to have
multiple partially redundant Mix genes, including the
specific paired-like-homeobox gene called Mxtx1,
which is expressed in an analogous location to the
primitive endoderm in mammalian embryos (Hirata
et al. 2000).
The conservation of Mix-family genes across many
species make them useful for investigating the
evolutionary history of these species. In the present
study, sequences corresponding to the Mixl1 gene in
mice and platypus and Mxtx1gene in zebrafish will be
extracted and sequenced. The gene structures for
mouse and platypus Mixl1 along with zebrafish Mxtx1
are shown in Figure 2. Note the homeobox
sequences are separated by a variable intrionic
sequence in each species. Primers have been
designed to extract sequences containing the
homeobox regions. The extent of conservation of
nucleotide sequence and amino acid sequence will
be determined and the evolutionary history of these
species briefly discussed.
Figure 1. Illustrations of Hox gene properties. (A) Collinear expression of Hox genes in Drosophila. The order of
Hox genes in the HOM-C cluster in the genome corresponds to spatial and temporal expression across the anterior-
posterior body axis. Image: (Mark et al. 1997). (B) General schematic representation of a Hox gene, homeoprotein and
homeodomain. The 180 bp homeobox on the Hox gene encodes the homeodomain on the homeoprotein. The
homeodomain has three helices. Image: (Lappin et al. 2006). (C) Binding of homeodomain motif to DNA. The N-terminal
arm contacts the minor groove while the third helix interacts with the major groove. Image: (Hueber 2007).
A. Mouse Mixl1 gene structure
B. Platypus Mixl1 gene structure
C. Zebrafish mxtx1 gene structure
Figure 2. Schematic gene diagrams of (A) mouse Mixl1, (B) platypus Mixl1 and (C) zebrafish Mxtx1 (Daish 2016b).
Primers Homeobox
location
RESULTS & DISCUSSION
Extraction & Purification of Genomic DNA
for Sequencing Reactions.
Figure 3 shows that most genomic DNA samples
(Lanes 1-5) were successfully extracted, with distinct
intense bands in most lanes indicating sufficient
amounts of gDNA at least 23 kbp in size. This
suggests excessive degradation has not occurred for
DNA in Lanes 1-5 and integrity of extracted DNA is
high. In contrast, the diffuse band in Lane 6 is less
than 0.56 kpb, suggesting only very small fragments
of DNA are present. Possible causes of this include
contamination by nucleases that degrade DNA.
excess salt in the sample, leaving the sample at high
temperature for too long, or using an incorrect
buffer solution (ThermoFisher 2016). Issues with the
reagents themselves are unlikely as the same
reagents were used to prepare all samples.
Streaking of bands is prevalent, signifying various
sizes of DNA in the samples, although the intense
bands near 23 kbp indicate most DNA is still intact
in large genomic fragments. Intense staining at the
wells of samples in Lanes 2, 3 and 5 suggest
particularly high concentrations of genomic DNA;
these high concentrations would block the pores of
the gel matrix, inhibiting DNA movement through
the gel. Consequently, smearing of the bands occurs
as DNA bleeds into the gel slowly, producing the
streaking observed. Band distortion in Lanes 2, 3 and
5 may have been caused by air bubbles when loading
sample or uneven heating of gel which would cause
local changes in buffer conductivity (Qiagen 2015).
Sample DNA concentration is estimated through
comparing intensity of sample bands with the
intensity of fragments in the 0.5μg of Lambda/HindIII
molecular weight marker in Lane 7. Box 1 explains
how the concentration of the zebrafish genomic
DNA in Lane 4 was estimated to be 13.2ng/μL. Bands
in other lanes (e.g. 3, 5) in Figure 3 are more intense,
indicating higher concentrations of genomic DNA.
However, it is not critical that the gDNA sample
concentration is high. The recommended
concentration of gDNA for amplification via PCR
ranges from 25-100ng/μL, so having a lower
concentration simply means that a greater amount
should be used in the PCR (see Appendix A).
Isolation of DNA fragments containing mixl1
homeobox structure using PCR.
The PCR products are shown in a gel
electrophoresis image in Figure 4. Most PCR
product sizes are consistent with expected amplicon
sizes, indicating primers were highly specific to the
target sequence and amplification was successful.
The expected amplicon size for the Z1 forward and
reverse primers for the zebrafish is 653 bp (Daish
2016b). This is consistent with the gel
electrophoresis in Figure 4, which shows a PCR
product of approximately 653 bp in Lane 5.
Meanwhile, Lanes 1, 2 and 7 in Figure 3 show bands
corresponding to the expected amplicon using the
platypus PF2 and PR2 primers of 685 bp (Daish
2016b). However, Lane 7 shows an additional band
of approximately 360 bp. It is possible that this
second PCR product arose from the primers binding
to and amplifying another region on the template
DNA.
Figure 3. Electrophoresis of extracted
genomic DNA from various species on
1.5% agarose gel.
Lanes 1-6 contain 7.5μg samples of genomic
DNA; 1 = Mouse liver, 2 = Mouse liver, 3 =
Zebrafish, 4 = Zebrafish, 5 = Platypus, 6 =
Mouse liver, 7 = 0.5μg Lambda/HindIII molecular
weight marker.
In this case, gel purifying the 685 bp PCR product
would help ensure that the purity is sufficient for a
successful sequencing reaction. Lanes 8 and 9 used
the MF1/MR1 and MF2/MR2 primer sets
respectively. Lane 8 has one band of 453 bp while
Lane 9 has one band of 391 bp. These sizes are
consistent with expected amplicon sizes and the
lack of other bands indicates the PCR products are
sufficiently pure. However, faintness of these bands
suggests low concentration of PCR products. This
could be due to non-optimal PCR conditions. For
example, temperature may have been too low
resulting in incomplete denaturation, DNA template
had insufficient integrity, denaturation time was too
long leading to degradation (Bio-Rad 2016).
The product in Lane 6 failed to amplify. Running the
following controls in the same gel would help
determine the cause of failure along with ensuring
the correct target sequence was amplified:
Negative control with water to ensure no DNA
contamination was in the water.
Negative control with all PCR reagents except
for DNA template to ensure that reagents are
not contaminated, and there is no non-specific
amplification in the reaction. Detection of
positive signal in this control would indicate the
presence of contaminating nucleic acids.
Positive control using template and primers
known to amplify correctly and produce distinct
bands under the PCR conditions. This control
should contain the same PCR reagents as the
samples and should be easily distinguished from
the target DNA (e.g different size).
Preparing several samples of different
concentration may help determine if smearing is
due to using too high of a DNA concentration.
This is particularly important if the
concentration of the genomic DNA was only
estimated.
Lane 4 has one smeared band where the majority of
DNA has remained in the well. A smear instead of a
single band indicates DNA fragments of varying sizes.
The expected products of a successful PCR reaction
should have the same sequence and same size,
assuming PCR conditions are optimal and the
primers are specific. Hence varying sized bands
indicate that the PCR reaction was not specific
enough in amplifying the target DNA. Possible
causes include:
Non-specific primers, leading them to bind to
other parts of the template DNA which also get
amplified.
Non-optimal cycling conditions: For example,
excessive number of cycles, excessive extension
time, excessive annealing time, or insufficiently
Box 1. Estimation of concentration of
zebrafish genomic DNA in Lane 4.
Table 1. Known sizes of DNA fragments
from Lambda/HindIII molecular weight
marker
Fragment Size (Kbp)
1 23
2 9.6
3 6.6
4 4.4
5 2.2
6 2.0
7 0.56
Total Size 48.36
Source: Genetics III Practical Manual (University of
Adelaide, 2016).
The total size of the fragments in the DNA
marker is 48.36 kbp. Since 0.5μg of
molecular weight marker was used, the
corresponding ratio is:
48.36kpb
0.5μg=48.36kbp
500ng=
1kbp
10.339ng
The sample in Lane 4 has comparable
intensity to Fragment 2 of the molecular
weight marker, corresponding to a size of
9.6 kbp (Table 1):
1kbp
10.339ng× 9.6kbp =
9.6kbp
99.26ng
i.e. There is 99.26ng of genomic DNA in
Lane 4. Because 7.5μL of genomic DNA had
been loaded onto the gel, the concentration
of genomic DNA in Lane 4 is estimated to
be:
99.26ng
7.5μL=13.2ng
1μL
Figure 2. Estimation of Zebrafish genomic
DNA concentration (Lane 4 from Figure 1)
and brief explanation of reasoning.
high annealing temperature all increase the
opportunity for non-specific amplification (Bio-
Rad 2016).
Too high concentration of template DNA. This
can inhibit the polymerase due to inhibitors in
the template or inefficient denaturation (Qiagen
2015).
Genomic DNA was of poor quality (e.g.
sheared).
Using such a PCR product in a sequencing reaction
would likely result in many nucleotides which cannot
be accurately identified (appear as “N” in the
sequence). The presence of multiple DNA
sequences (from the non-specific PCR products)
means that the sequence of the desired PCR
product cannot be distinguished from the
contaminating sequences. The PCR products in
Lanes 5 and 7 in Figure 4 are suitable for sequencing
due to single discrete PCR bands indicating
amplicons of expected size. The concentration of
the zebrafish template DNA used in Lane 5 was
measured to be 20.28 μg/mL using a
spectrophotometer.
Figure 4. Gel electrophoresis image of PCR products of various species. Lanes 2-9 were loaded
with 25-100ng DNA template. 1 = SPP1 Molecular Markers; 2 = Platypus, PF2 and PR2 primers; 3 =
Platypus, PF2 and PR2 primers; 4 = Mouse, MF1 and MR1 primers; 5 = Zebrafish ZF1 and ZR1 primers, 6 =
Zebrafish ZF2 and ZR2 primers; 7 = Platypus PF2 and PR2 primers; 8 = Mouse MF1 and MR1 primers; 9 =
Mouse MF2 and MR2 primers. Approximate size of successful amplicons marked above PCR bands.
360
Sequencing of Zebrafish PCR Product & Identification of mxtx1 homeobox sequence.
Figure 6. Comparison of sequenced zebrafish and cavefish Mxtx1 homeobox sequences, and
corresponding amino acid sequences.
Figure 5 shows a sequence alignment of the zebrafish
amplicon amplified with ZF1 and ZR1 primers and
sequenced using ZF1 primer; zebrafish amplicon
amplified with ZF2 and ZR2 primers and sequenced
using ZF2 primer (obtained from demonstrators);
and the cavefish mxtx1 homeobox sequence. Both
the ZF1 and ZF2 zebrafish sequences were required
to determine the full 180 bp zebrafish homeobox, as
although the ZF1 amplicon has most of the required
homeobox sequence, the ZF2 amplicon has the
remaining small part. The zebrafish and cavefish
sequences are very similar, which is expected as
homeobox sequences tend to be highly conserved
through evolution, and the cavefish and zebrafish
share a common ancestor. There are several
nucleotide differences (36/180 = 20%), indicating
that point mutations have occurred since the
zebrafish and cavefish diverged from their common
ancestor. However, some of these may also be due
to the accuracy of the zebrafish sequences used. It