Top Banner
Please take a ticket • …and sit in your group (7 different groups) • This if for a larger exercise in the afternoon
55

Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Dec 25, 2015

Download

Documents

Audra Gibson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Please take a ticket

• …and sit in your group (7 different groups)

• This if for a larger exercise in the afternoon

Page 2: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Today

• Homework1!

• Tiling arrays and ChIP

• Cross-species conservation

• (possibly) The ENCODE project

• Tying it all together ( larger group exercise in interpretation)

Page 3: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

From last time:Lets look at these things

• 5 minutes with your sideman:• Look at the RPS9 gene, and turn on Refseqs,

UCSC genes, human mRNAs, ESTs, CpG islands and repeats

• How well does refseqs, ESTs and Known genes correlate

• Are there any CpGs or repeats - where are they located? What type of repeats are there?

Page 4: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Annotating genome by hybridization approaches

• There is a set of methods to annotate genomes that are based on the ability of DNA and/or RNA to hybridize in double strands

Page 5: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.
Page 6: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Fundamental idea: make DNA probes that are complementary to

RNAs that exist in the cellPut these on a glass slide

Measure hybridization by attaching a color molecule at the RNA end

RNA

DNA PROBES

Page 7: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

RNA

DNA PROBES

If the RNA that we designed the probe for is present, The probe will light up = the gene is expressed

Page 8: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

What do we make probes for?

• Genes?

Probe specific for the HCFC1 gene

Page 9: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

• Say that we make one probe per gene and put on a glass slide

• That way we could potentially detect all genes if they are expressed

• This is how microarrays work (next part of the course)

• What are the limitations with this?• (2 minutes)

Page 10: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Albin’s take• Even if it works perfectly, we will only get

values for genes we know (that we make probes for)

• We get no information on how the gene looks (what isoform) – we just know whether it is there or not

• We get only expression information, no location information

• How can we resolve these issues?

Page 11: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Answer: make “all possible probes”

• How do we know what probes that are possible?

• We have the genome!

• Make probes tiling the genome

• What probes will light up if only the HCFX1 gene is expressed?

Page 12: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

• Only the exonic parts should light upBecause we hybridize mature mRNAs

We call these probe sets on glass slides “tiling arrays”

Page 13: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Tiling arraysTiling arrays are multi-purpose tools - can be use for

many things: to see• expressed RNA • DNA regions bound by transcription factors• Accessible DNA

The tiling array is composed of probes that “tiles” the genome – meant to cover most of it.

This is only partially true: 1) Probes are often separated by spacers2) There are regions that are hard to make meaningful

probes for – what regions would that be?

Page 14: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Finding active RNA locations

Page 15: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Lets look at this in the browser - still the RPS9 gene

Slightly more tricky, due to how the data looks. This is non-standard data, so we need to use a specific assembly: Human May 2004 (hg17)

Use the Affy Txn… track in the Expression group,

and click on it to set options

Page 16: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Set the option like this for now, and submitTry to interpret the many tracks compared to the cDNA tracks: what are we looking at?

Page 17: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.
Page 18: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Transfrags and signals

• Tiling array probes give a signal - the stronger it is, the more “expression”.

• Tiling array probes are after each other, so it makes sense to group all such probes that have “significant” signal together to a larger block

• This is done by specific statistics packages (not part of the course, but can be downloaded) - methodology is slightly arbitrary

• Affymetrix calls these blocks “transfrags”• They should be viewed as a simplification of the data,

that often is helpful

Page 19: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Lets change the options to also look at transfrags

Page 20: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Again, lets look at the region

Live Demo

To consider: What is the advantage of tiling arrays compared to cDNA sequencing, and vice versa?

Page 21: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

cDNA sequencing

• Connectivity between exons

• Full-length transcripts• Can be a blind method –

but is often used in a targeted way – it is common to make cDNAs using targeted PCR

Tiling arrays

• No connectivity, just signals where probes are

• Issues with cross-hybridization

• Hard to get transcript edges right

• An almost blind method

Page 22: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Blind methods• …are not targeting any specific gene or

region

• Independent of annotation!

• Some call this “unbiased”, but…no methods are unbiased

• Valuable, because molecular biology is very affected by “ascertainment bias”

• Blind methods are not affected by this, and therefore can give totally new insights

Page 23: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Tiling arrays used to find transcription factor binding

sites - ChIP-on-chipChromatin immuno-precipitation(ChIP) is a

classical molecular biology technique - used to capture DNA-bound proteins and corresponding sequence

Page 24: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Chromatin immuno-precipitation (ChIP):

1. Fixating everything that is bound to DNA by formaldehyde

2. Shearing DNA3. Fish out the protein of

interest with an antibody

4. Then get the bound DNA and sequence itIsolation of bound DNA: Sequencing

Page 25: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

ChIP-chipWith a tiling array, we can see all sites in the genome by instead putting the bound DNA on the tiling array!

Page 26: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Assessing ChIP-chip data in the browser

• Data-wise, the same thing as tiling arrays with RNA: probes will get signal, and we lump probes together like transfrags

• Usually looks very different: much fewer regions, and smaller regions

• Live Demo: Using the “Affy sites” track in the ENCODE Chromatin Immunoprecipitation track: set this to “full”.

• To consider: what are the pros and cons with this technique compared to predicting transcription factor binding sites comutationally

Page 27: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Recent developments

• Tiling arrays were very hot 2004-2007

• Are now becoming outdated by new sequencing methods(!).

• More in the last part of the course

Page 28: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Genomic conservation between species

As we have the genomic sequence of multiple species, it makes sense to compare these on genome level

In the UCSC browser, this is made in many steps, and we can choose what step we want to look at

Central to the analysis is that we compare two genomes to each other at a time, and later combine the results

Page 29: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Big-chunk changes

• Whole chromosomes do not correspond to each other in terms of content between species - a lot of moving-around has occurred

Page 30: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

So, it makes sense to try to find smaller pieces that are very similar, and then connect these to each other

What mouse genome parts matches to where in human?

Page 31: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Mouse genome regions with high similarity are shown as blocks, colored after what chromosomes they are fromMatches to different chromosomes at the same place might mean duplicated genes, or even duplicated parts of genesWe try to make chains that link these blocks:

Must have a logical order (follow each other) in BOTH species, but can skip segments

The human ACTN3 gene aligned to mouse

Page 32: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Now, most chromosomes just have one longer chain.Seems clear that the pink chain is the best, and that the yellow and green represent gene variants on different chromosomes What if want the “best” chain?The NET algorithm tries to find the best chain, and throws out other chains that overlap it. Potentially you can get many NETs, though

Page 33: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

In this case, there is only one solution

Page 34: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Walking between genomes

By clicking on a chain or a “netted” chain from another species, you get to page where you can open a new browser with that position in the other species

(live demo)

Page 35: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

The “Conservation” track

• The conservation track is multiple alignment (“all” species), built from pairwise net chains

• It is both an alignment and a “score”, which is a –log( P( nucleotide is neutrally evolving))

• Alignments are shown as letters if we zoom in deep enough

(live demo).

In the newest assemblies, there are two conservation tracks, made by slightly different methods. They are basically two sides of the same coin - same basic idea.

Page 36: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Challenge

• Turn on the mouse chain and net tracks for the RPS9 gene

• Are there one or many chains?

• If you use these chains to go over to the mouse genome, what do you hit (use all annotation that is relevant)?

Page 37: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Variation within species on single nucleotides

• Single nucleotide polymorphisms (SNPs “snips”) - a whole research field

• Single nucleotide change that happens in >1% of the population.

• The UCSC browser shows SNPs from the SNPdb (database at NIH). It colors SNPs after potential functional annotation

Page 38: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

ChallengeTechnical:• In the RPS9 gene, turn on the SNP track to

“full”• At one part of the gene, the SNPs are colored.

What annotations do the colors stand for?Biological/philosophical:• Find arguments and counter-arguments on the

following statement• “SNPs are what makes us humans different

from each other”

Page 39: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Larger exercise in larger groups: interpreting genes

• I will divide you into 6 groups - look at your lottery ticket• Each group will get two gene IDs

– You will investigate both of these in the browser (mm8 assembly ), for instance:

• what is special with it• what is the gene product (annotation)• what kind of track features are particularly interesting/strange in this

case, etc. Use all tracks you we have talked about if available and relevant, or even other tracks

• Do not forget to look at the neighborhood of the gene - zoom out!• You will then show the main findings for one of the genes to the other groups

(say 3-5 minutes) - and we can discuss and interpret. • The other gene, another group will present. Your group will be “opponents” for

this gene: try to find things that the other group have not thought about. • Presentation and opponent genes will be distributed randomly – you wont know

until you present.

Page 40: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Some cutting edge science:The ENCODE project

• Encyclopedia Of DNA Elements • Aims for a targeted and coordinated elucidation of the

(whole) human genome, using multiple systems• Currently ended the pilot phase: analyze 1% (30 Mbp)of the

genome deeply• 44 regions• 14 regions chosen by function: for instance the HOXD gene

cluster – 0.5 to 2 Mbp • 30 regions chosen “randomly” - 500 kb• Should be viewed as a pilot project for the rest of the

genome – so both technology and biology is driving• A large number of labs involved in both data production and

analysis

Page 41: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Some impressive numbers• 400 million data points, excluding sequencing of other

genomes (adds another 250 million)!

• Tiling arrays from 11 different cell sources

• 96 ChIP-chip experiments

• Tag sequecning data to identify promoters (covered later in the course)

• In-depth cDNA annotation (GENCODE)

• Sequencing of orthologous regions in a wide array of species

• …and a lot more

Page 42: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

…and these are not all data tracks!

Page 43: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Why introduce these things now in the course?

• Get you used to look at tons of data at once

• This is a fantastic data resource, which is under-used

• make you realize that to analyze such data, you will have to understand the underlying method/biology

Page 44: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.
Page 45: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

.

B.

C.

A.

Page 46: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.
Page 47: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.
Page 48: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Highlights of the ENCODE paperEwan Birney*,1, John A. Stamatoyannopoulos*,2, Anindya Dutta*,3, Roderic Guigó*,4, 5, Thomas R. Gingeras*,6, Elliott H. Margulies*,7, Zhiping Weng*,8, 9,

Michael Snyder*,10, 11, Emmanouil T. Dermitzakis*,12;John A. Stamatoyannopoulos*,2, Robert E. Thurman2, 13, Michael S. Kuehn2, 13, Christopher M. Taylor3, Shane Neph2, Christoph M. Koch12, Saurabh Asthana14, Ankit Malhotra3, Ivan Adzhubei14, Jason A. Greenbaum15, Robert M. Andrews12, Paul Flicek1, Patrick J. Boyle3, Hua Cao13, Nigel P. Carter12, Gayle K. Clelland12, Sean Davis16, Nathan Day2, Pawandeep Dhami12, Shane C. Dillon12, Michael O. Dorschner2, Heike Fiegler12, Paul G. Giresi17, Jeff Goldy2, Michael Hawrylycz18, Andrew Haydock2, Richard Humbert2, Keith D. James12, Brett E. Johnson13, Sarah M. Johnson13, Neerja Karnani3, Kristin Lee2, Gregory C. Lefebvre12, Patrick A. Navas13, Fidencio Neri2, Stephen C. J. Parker15, Peter J. Sabo2, Richard Sandstrom2, Anthony Shafer2, David Vetrie12, Molly Weaver2, Sarah Wilcox12, Man Yu13, Francis S. Collins7, Job Dekker19, Jason D. Lieb17, Thomas D. Tullius15, Gregory E. Crawford20, Shamil Sunayev14, William S. Noble2, Ian Dunham12, Anindya Dutta*,3;Roderic Guigó*,4, 5, France Denoeud5, Alexandre Reymond21, 22, Philipp Kapranov6, Joel Rozowsky11, Deyou Zheng11, Robert Castelo5, Adam Frankish12, Jennifer Harrow12, Srinka Ghosh6, Albin Sandelin23, Ivo L. Hofacker24, Robert Baertsch25, 26, Damian Keefe1, Paul Flicek1, Sujit Dike6, Jill Cheng6, Heather A. Hirsch27, Edward A. Sekinger27, Julien Lagarde5, Josep F. Abril5, 28, Atif Shahab29, Christoph Flamm24, 30, Claudia Fried30, Jörg Hackermüller31, Jana Hertel30, Manja Lindemeyer30, Kristin Missal30, 32, Andrea Tanzer24, 30, Stefan Washietl24, Jan Korbel11, Olof Emanuelsson11, Jakob S. Pedersen26, Nancy Holroyd12, Ruth Taylor12, David Swarbreck12, Nicholas Matthews12, Mark C. Dickson33, Daryl J. Thomas25, 26, Matthew T. Weirauch25, James Gilbert12, Jorg Drenkow6, Ian Bell6, XiaoDong Zhao34, K.G. Srinivasan34, Wing-Kin Sung34, Hong Sain Ooi34, Kuo Ping Chiu34, Sylvain Foissac4, Tyler Alioto4, Michael Brent35, Lior Pachter36, Michael L. Tress37, Alfonso Valencia37, Siew Woh Choo34, Chiou Yu Choo34, Catherine Ucla22, Caroline Manzano22, Carine Wyss22, Evelyn Cheung6, Taane G. Clark38, James B. Brown39, Madhavan Ganesh6, Sandeep Patel6, Hari Tammana6, Jacqueline Chrast21, Charlotte N. Henrichsen21, Chikatoshi Kai23, Jun Kawai23, 40, Ugrappa Nagalakshmi10, Jiaqian Wu10, Zheng Lian41, Jin Lian41, Peter Newburger42, Xueqing Zhang42, Peter Bickel43, John S. Mattick44, Piero Carninci40,Yoshihide Hayashizaki23, 40, Sherman Weissman41, Emmanouil T. Dermitzakis*,12, Elliott H. Margulies*,7, Tim Hubbard12, Richard M. Myers33, Jane Rogers12, Peter F. Stadler24, 30, 45, Todd M. Lowe25, Chia-Lin Wei34, Yijun Ruan34, Michael Snyder*,10, 11, Ewan Birney*,1, Kevin Struhl27, Mark Gerstein11, 46, 47, Stylianos E. Antonarakis22, Thomas R. Gingeras*,6;James B. Brown39, Paul Flicek1, Yutao Fu8, Damian Keefe1, Ewan Birney*,1, France Denoeud5, Mark Gerstein11, 46, 47, Eric D. Green7, 48, Philipp Kapranov6, Ulaş Karaöz8, Richard M. Myers33, William S. Noble2, Alexandre Reymond21, 22, Joel Rozowsky11, Kevin Struhl27, Adam Siepel25, 26, $, John A. Stamatoyannopoulos*,2, Christopher M. Taylor3, James Taylor49, 50, Robert E. Thurman2, 13, Thomas D. Tullius15, Stefan Washietl24, Deyou Zheng11;Laura Liefer51, Kris A. Wetterstrand51, Peter J. Good51, Elise A. Feingold51, Mark S. Guyer51, Francis S. Collins52;Elliott H. Margulies*,7, Gregory M. Cooper33,%, George Asimenos53, Daryl J. Thomas25, 26, Colin N. Dewey54, Adam 62;Gerard G. Bouffard7, 48, Xiaobin Guan48, Nancy F. Hansen48, Jacquelyn R. Idol7, Valerie V.B. Maduro7, Baishali Maskeri48, Jennifer C. McDowell48, Morgan Park48, Pamela J. Thomas48, Alice C. Young48, and Robert W. Blakesley7, 48;Donna M. Muzny63, Erica Sodergren63, David A. Wheeler63, Kim C. Worley63, Huaiyang Jiang63, George M. Weinstock63, and Richard A. Gibbs63;Tina Graves64, Robert Fulton64, Elaine R. Mardis64, and Richard K. Wilson64;Michele Clamp65, James Cuff65, Sante Gnerre65, David B. Jaffe65, Jean L. Chang65, Kerstin Lindblad-Toh65, and Eric S. Lander65, 66;Maxim Koriabine67, Mikhail Nefedov67, Kazutoyo Osoegawa67, Yuko Yoshinaga67, Baoli Zhu67, and Pieter J. de Jong67;Zhiping Weng*,8, 9, Nathan D. Trinklein33,#, Yutao Fu8, Zhengdong D. Zhang11, Ulaş Karaöz8, Leah Barrera68, Rhona Stuart68, Deyou Zheng11, Srinka Ghosh6, Paul Flicek1, David C. King50, 59, James Taylor49, 50, Adam Ameur69, Stefan Enroth69, Mark C. Bieda70, Christoph M. Koch12, Heather A. Hirsch27, Chia-Lin Wei34, Jill Cheng6, Jonghwan Kim71, Akshay A. Bhinge71, Paul G. Giresi17, Nan Jiang72, Jun Liu34, Fei Yao34, Wing-Kin Sung34, Kuo Ping Chiu34, Vinsensius B. Vega34, Charlie W.H Lee34, Patrick Ng34, Atif Shahab29, Edward A. Sekinger27, Annie Yang27, Zarmik Moqtaderi27, Zhou Zhu27, Xiaoqin Xu70, Sharon Squazzo70, Matthew J. Oberley73, David Inman73, Michael A. Singer72, Todd A. Richmond72, Kyle J. Munn72, 74, Alvaro Rada-Iglesias74, Ola Wallerman74, Jan Komorowski69, Gayle K. Clelland12, Sarah Wilcox12, Shane C. Dillon12, Robert M. Andrews12, Joanna C. Fowler12, Phillippe Couttet12, Keith D. James12, Gregory C. Lefebvre12, Alexander W. Bruce12, Oliver M. Dovey12, Peter D. Ellis12, Pawandeep Dhami12, Cordelia F. Langford12, Nigel P. Carter12, David Vetrie12, Philipp Kapranov6, David A. Nix6, Ian Bell6, Sandeep Patel6, Joel Rozowsky11, Ghia Euskirchen10, Stephen Hartman10, Jin Lian41, Jiaqian Wu10, Alexander E. Urban10, Peter Kraus10, Sara Van Calcar68, Nate Heintzman68, Tae Hoon Kim68, Kun Wang68, Chunxu Qu68, Gary Hon68, Rosa Luna75, Christopher K. Glass75, M. Geoff Rosenfeld75, Shelley Force Aldred33,#, Sara J. Cooper33, Anason Halees8, Jane M. Lin9, Hennady P. Shulha9, Xiaoling Zhang8, Mousheng Xu8, Jaafar N. S. Haidar9, Yong Yu9, Ewan Birney*,1, Sherman Weissman41, Yijun Ruan34, Jason D. Lieb17, Vishwanath R. Iyer71, Roland D. Green72, Thomas R. Gingeras*,6, Claes Wadelius74, Ian Dunham12, Kevin Struhl27, Ross C. Hardison50, 59, Mark Gerstein11, 46, 47, Peggy J. Farnham70, Richard M. Myers33, Bing Ren68, Michael Snyder*,10, 11;Daryl J. Thomas25, 26, Kate Rosenbloom26, Rachel A. Harte26, Angie S. Hinrichs26, Heather Trumbower26, Hiram Clawson26, Jennifer Hillman-Jackson26, Ann S. Zweig26, Kayla Smith26, Archana Thakkapallayil26, Galt Barber26, Robert M. Kuhn26, Donna Karolchik26, David Haussler25, 26, 60, W. James Kent25, 26;Emmanouil T. Dermitzakis*,12, Lluis Armengol76, Christine P. Bird12, Taane G. Clark38, Gregory M. Cooper33,%, Paul I. W. de Bakker77, Andrew D. Kern26, Nuria Lopez-Bigas5, Joel D. Martin50, 59, Barbara E. Stranger12, Daryl J. Thomas25, 26, Abigail Woodroffe78, Serafim Batzoglou53, Eugene Davydov53, Antigone Dimas12, Eduardo Eyras5, Ingileif B. Hallgrímsdóttir79, Ross C. Hardison50, 59, Julian Huppert12, Arend Sidow33, 62, James Taylor49, 50, Heather Trumbower26, Michael C. Zody77, Roderic Guigó*,4, 5, James C. Mullikin7, Gonçalo R. Abecasis78, Xavier Estivill76, 80 and Ewan Birney*,1.

Page 49: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

The genome is pervasively transcribed

The majority of nucleotides in the encode regions are part of at least one primary transcript

GENCODE annotation, RACE-array experiments (RxFrags), and PET tags

Page 50: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Regulatory elements are distributed around TSSs (not upstream in

particular)

Page 51: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Distal TSSs• RACE extension validated by PCR of

exons detected by tiling array show that many genes can have distal TSS within other genes(!)

330kb

Page 52: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Around 5% of the bases are under selective pressure

• However, not all functional regions are conserved: biologically active elements with neutral benefits?

• Or, is our measure of conservation capturing what we want?

Page 53: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Back to the UCSC browser

Two ways to use ENCODE data

1. The encode tracks (we have already used some). Danger: only covers 1% of genome!

2. The ENCODE version of the UCSC browser: http://genome.ucsc.edu/ENCODE/ only shows the 1% regions. Same data as above, though

Page 54: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Issues

• ENCODE data is complex and hard to interpret due to– New technology: what is noise and what is signal?– What does the signal mean even if it real?– Very many technologies - no-one knows them all– Messy biology– Use of different cell lines in different experiments

(sigh)

Page 55: Please take a ticket …and sit in your group (7 different groups) This if for a larger exercise in the afternoon.

Larger challenge

Use the ENCODE browser and again look at the RPS9 gene

What additional tracks are available?

Do we see anything more than we have already seen?

(I am NOT expecting you to understand and look at every track :) )