Top Banner

of 55

Micro Array Data Analysis 06

Apr 06, 2018

Download

Documents

chiemera
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/3/2019 Micro Array Data Analysis 06

    1/55

    Microarray Data Analysis

    The Bioinformatics side of the bench

  • 8/3/2019 Micro Array Data Analysis 06

    2/55

    The anatomy of your data files

    from Affymetrix array analysis

    .DAT= image file (107 pixels)

    .CEL= measured cell intensities

    .CDF= cell descriptions files (identifyprobe sets and probe set pairs)

    .CHP= calculated probe set data

    .RPT= report generated from .CHP

  • 8/3/2019 Micro Array Data Analysis 06

    3/55

    Quality Control (QC) of the

    chipvisual inspection

    Look at the .DAT file or the .CHP fileimage

    Scratches? Spots?

    Corners and outside bordercheckerboard appearance (B2 oligo)

    Positive hybridization control

    Used by software to place grid over image

    Array name is written out in oligos!

  • 8/3/2019 Micro Array Data Analysis 06

    4/55

  • 8/3/2019 Micro Array Data Analysis 06

    5/55

    Chip defects

  • 8/3/2019 Micro Array Data Analysis 06

    6/55

    Internal controls

    B. subtilisgenes (added poly-A tails)

    Assessment of quality of sample preparation

    Also as hybridization controls

    Hybridization controls (bioB, bioC, bioD, cre)

    E. coli and P1 bacteriophage biotin-labeled cRNAs

    Spiked into the hybridization cocktail

    Assess hybridization efficiency

    Actin and GAPDH assess RNA sample/assay quality

    Compare signal values from 3 end to signal valuesfrom 5 end

    ratio generally should not exceed 3

    Percent genes present (%P)

    Replicate samples - similar %P values

  • 8/3/2019 Micro Array Data Analysis 06

    7/55

    1. Experimental Design

    2. Image Analysisscan to intensity measures (raw

    data)

    3. Normalizationclean data

    4. More low level analysis-fold change, ANOVA,

    data filtering

    5. Data mining-how to interpret > 6000 measures

    Databases

    Software

    Techniques-clustering, pattern recognition etc.

    Comparing to prior studies, across platforms?

    6. Validation

    Microarray Data Process/Outline

  • 8/3/2019 Micro Array Data Analysis 06

    8/55

    Experimental Design

    A good microarray design has 4 elements

    1. A clearly defined biological question or hypothesis

    2. Treatment, perturbation and observation of biologicalmaterials should minimize systematic bias

    3. Simple and statistically sound arrangement that minimizescost and gains maximal information

    4. Compliance with MIAME (minimal information about

    microarray experiment)

    The goal of statistics is to find signals in a sea of noise

    The goal of exp. design is to reduce the noise so signals canbe found with as small a sample size as possible

  • 8/3/2019 Micro Array Data Analysis 06

    9/55

    Observational Study vs.

    Designed Experiment

    Observational study-

    Investigator is a passive observer whomeasures variables of interest, but does not

    attempt to influence the responses

    Designed Experiment-

    Investigator intervenes in natural course ofevents

    What type is our DMSO exp?

  • 8/3/2019 Micro Array Data Analysis 06

    10/55

    Experimental Replicates

    Why? In any exp. system there is a certain amount of

    noiseso even 2 identical processes yield slightlydifferent results

    Sources?

    In order to understand how much variation there isit is necessary to repeat an exp a # of independenttimes

    Replicates allow us to use statistical tests toascertain if the differences we see are real

  • 8/3/2019 Micro Array Data Analysis 06

    11/55

  • 8/3/2019 Micro Array Data Analysis 06

    12/55

    Technical vs. Biological Replicates

    As we progress from the starting material to the scanned

    image we are moving from a system dominated by biologicaleffects through one dominated by chemistry and physics noise

    Within Affy platform the dominant variation is usually of abiological nature thus best strategy is to produce replicates ashigh up the experimental tree as possible

  • 8/3/2019 Micro Array Data Analysis 06

    13/55

    Low level data analysis / pre-processing

    Varying biological or cellularcomposition among sampletypes.

    Differences in samplepreparation, labeling orhybridization

    Non specific cross-hybridization of target toprobes.

    Lead to systemic differencesbetween individual arrays

    Raw Data Quality Control

    Scaling

    Normalization andfiltering.

  • 8/3/2019 Micro Array Data Analysis 06

    14/55

    Image Analysis - Raw Data

  • 8/3/2019 Micro Array Data Analysis 06

    15/55

    From probe level signals to gene abundance

    estimates

    The job of the expression summary algorithm is

    to take a set of Perfect Match (PM) and Mis-Match (MM) probes, and use these to generatea single value representing the estimatedamount of transcript in solution, as measuredby that probeset.

    To do this, .DAT files containing array images are firstprocessed to produce a .CEL file, which containsmeasured intensities for each probe on the array.

    It is the .CEL files that are analyzed by the expressioncalling algorithm.

    http://bioinformatics.picr.man.ac.uk/mbcf/example_ma.jsphttp://bioinformatics.picr.man.ac.uk/mbcf/example_ma.jsp
  • 8/3/2019 Micro Array Data Analysis 06

    16/55

    MAS 5.0 output files

    For each transcript (gene) on the chip:

    signal intensity

    a present or absent call (presence call) p-value (significance value) for making that

    call

    Each gene associated with GenBank

    accession number (NCBI database)

  • 8/3/2019 Micro Array Data Analysis 06

    17/55

    How are transcripts determined to be

    present or absent?

    Probe pair (PM vs. MM) intensities

    generate a detection p-value

    assign Present, Absent, or Marginalcall for transcript

    Every probe pair in a probe SET has

    a potential vote for presence call

  • 8/3/2019 Micro Array Data Analysis 06

    18/55

    PM and MM Probes

    The purpose of each MM probe is to provide a directmeasure of background and stray-signal (perhaps dueto cross-hybridization) for its perfect-match partner. Inmost situations the signal from each probe-pair is simplythe difference PM - MM.

    For some probe-pairs, however, the MM signal isgreater than the PM value; we have an apparently

    impossible measure of background.

  • 8/3/2019 Micro Array Data Analysis 06

    19/55

    Thank goodness for software!!!

    MAS 5.0 does these calculations for you .CHP file

    Basic analysis in MAS 5.0, but it wonthandle replicates

    Import MAS 5.0 (.CHP) data into othersoftware, Genesifter, GCOS, SpotFire,

    and many others

  • 8/3/2019 Micro Array Data Analysis 06

    20/55

    Signal Intensity Following these calculations, the MAS 5.0

    algorithm now has a measure of thesignal for each probe in a probeset.

    Other algortihms, ex RMA, GCRMA,dCHIP, PLIER and others have beendeveloped by academic teams to improvethe precision and accuracy of this

    calculation In our Exp we will use RMA and GCRMA

  • 8/3/2019 Micro Array Data Analysis 06

    21/55

    How do we want to analyze

    this data?

    Pairwise analysis is most appropriate Control vs. DMSO

    List of genes that are upregulated ordownregulated

    Determine fold up or down cutoffs What is significant?

    1.5 fold up/down? 2 fold up/down? 10 fold up/down?

  • 8/3/2019 Micro Array Data Analysis 06

    22/55

    Normalization - clean data

    Normalizing data allows

    comparisons ACROSS differentchips

    Intensity of fluorescent markers mightbe different from one batch to the other

    Normalization allows us to compare

    those chips without altering theinterpretation of changes in GENEEXPRESSION

  • 8/3/2019 Micro Array Data Analysis 06

    23/55

    Why Normalize Data?

    The experimental goal is to identify biological variation(expression changes between samples)

    Technical variation can hide the real data

    Unavoidable systematic bias should be recognized andcorrected

    Normalization is necessary to effectively make comparisonsbetween chips-and sometimes within a single chip.

    There are different methods of normalization the

    assumptions of where variation exist will determine thenormalization techniques used.

    Always look at data before and after normalization

    Spike in controls can help show which method may be best

  • 8/3/2019 Micro Array Data Analysis 06

    24/55

    Caveat

    There is NO standard way toanalyze microarray data

    Still figuring out how to get the bestanswers from microarrayexperiments

    Best to combine knowledge ofbiology, statistics, and computers toget answers

  • 8/3/2019 Micro Array Data Analysis 06

    25/55

    MAS 5.0 GCRMARMA

    RMA

    GCRMAMAS 5.0

    Venn Diagrams

  • 8/3/2019 Micro Array Data Analysis 06

    26/55

    Data processing is completednow what?

    Fold change, ANOVA, Data filtering

  • 8/3/2019 Micro Array Data Analysis 06

    27/55

  • 8/3/2019 Micro Array Data Analysis 06

    28/55

  • 8/3/2019 Micro Array Data Analysis 06

    29/55

  • 8/3/2019 Micro Array Data Analysis 06

    30/55

  • 8/3/2019 Micro Array Data Analysis 06

    31/55

  • 8/3/2019 Micro Array Data Analysis 06

    32/55

  • 8/3/2019 Micro Array Data Analysis 06

    33/55

  • 8/3/2019 Micro Array Data Analysis 06

    34/55

  • 8/3/2019 Micro Array Data Analysis 06

    35/55

    Where are we now?

    Ran analysis, output is a GENELIST

    List indicates what genes are up ordown regulated

    p values for t-test

    Graphs of signal levels

    Absolute numbers not as important here asthe trends you see

    Now what????

  • 8/3/2019 Micro Array Data Analysis 06

    36/55

    What is the first set of genes on our chipsthat will be filtered out?

  • 8/3/2019 Micro Array Data Analysis 06

    37/55

    Follow the links

    Click on a gene

    Find links to other databases

    Follow links to discover what theprotein does

    Now the fun part begins.

  • 8/3/2019 Micro Array Data Analysis 06

    38/55

    Back to Biology

    Do the changes you see in geneexpression make senseBIOLOGICALLY?

    If they dont make sense, can you

    hypothesize as to why those genesmight be changing?

    Leads to many, many moreexperiments

  • 8/3/2019 Micro Array Data Analysis 06

    39/55

    A Common Language for Annotation of

    Genes from

    Yeast, Flies and Mice

    The Gene Ontologies

    and Plants and Worms

    and Humans

    and anything else!

    Gene Ontolog

  • 8/3/2019 Micro Array Data Analysis 06

    40/55

    Gene Ontology

    Objectives

    GO represents concepts used to classifyspecific parts of our biological knowledge: Biological Process

    Molecular Function

    Cellular Component

    GO develops a common language applicableto any organism

    GO terms can be used to annotate geneproducts from any species, allowingcomparison of information across species

  • 8/3/2019 Micro Array Data Analysis 06

    41/55

    Sriniga Srinivasan, Chief Ontologist, Yahoo!

    The ontology. Dividing human

    knowledge into a clean set of categoriesis a lot like trying to figure out where tofind that suspenseful black comedy atyour corner video store. Questionsinevitably come up, like are Movies partof Art or Entertainment? (Yahoo! liststhem under the latter.) -Wired

    Magazine, May 1996

  • 8/3/2019 Micro Array Data Analysis 06

    42/55

    Molecular Function= elemental activity/task

    the tasks performed by individual gene products; examples arecarbohydrate bindingand ATPase activity

    Biological Process= biological goal orobjective broad biological goals, such as mitosisor purine metabolism, that are

    accomplished by ordered assemblies of molecular functions

    Cellular Component= location or complex subcellular structures, locations, and macromolecular complexes;

    examples include nucleus, telomere, and RNA polymerase IIholoenzyme

    The 3 Gene Ontologies

  • 8/3/2019 Micro Array Data Analysis 06

    43/55

    Function (what) Process (why)

    Drive nail (into wood) Carpentry

    Drive stake (into soil) Gardening

    Smash roach Pest Control

    Clowns juggling object Entertainment

    Example:

    Gene Product = hammer

    Bi l i l E l

  • 8/3/2019 Micro Array Data Analysis 06

    44/55

    Biological Examples

    Molecular FunctionBiological Process Cellular Component

  • 8/3/2019 Micro Array Data Analysis 06

    45/55

    Validation

    Not enough to just do microarrays

    Usually validate microarray results

    via some other technique

    rt-PCR

    TaqMan

    Northern analysis

    Protein level analysis

    No technique is perfect

  • 8/3/2019 Micro Array Data Analysis 06

    46/55

    Dynamic Nature of Yeast Genome

  • 8/3/2019 Micro Array Data Analysis 06

    47/55

    Dynamic Nature of Yeast Genome

    eORF= essential

    kORF= known

    hORF= homologyidentified

    shORF= short

    tORF= transposonidentified

    qORF= questionable

    dORF= disabled

    First published sequence claimed 6274 genes a # thathas been revised many times, why?

  • 8/3/2019 Micro Array Data Analysis 06

    48/55

    The Affy detection oligonucleotide sequences are frozen at the timeof synthesis, how does this impact downstream data analysis?

    6603

    4373

    1410

    820

  • 8/3/2019 Micro Array Data Analysis 06

    49/55

    term: MAPKKK cascade (mating sensuSaccharomyces)

    goid: GO:0007244

    definition: MAPKKK cascade involved in transductionof mating pheromone signal, as described inSaccharomyces

    definition_reference: PMID:9561267

    Terms, Definitions, IDs

  • 8/3/2019 Micro Array Data Analysis 06

    50/55

    SGD

  • 8/3/2019 Micro Array Data Analysis 06

    51/55

  • 8/3/2019 Micro Array Data Analysis 06

    52/55

    SGD public microarray data sets available

  • 8/3/2019 Micro Array Data Analysis 06

    53/55

    for public query

  • 8/3/2019 Micro Array Data Analysis 06

    54/55

    Homework

    1. Go to http://www.yeastgenome.org/and find 3 candidate genes ofknown f(x) and one of undefined f(x) that you might predict to bealtered by DMSO treatment

    2. What GO biological processes and molecular mechanisms areassociated with your candidate genes?

    3. Where, subcellularly does the protein reside in the cell?

    4. What other proteins are known or inferred to interact with yours? Howwas this interaction determined? Is this a genetic or physicalinteraction?

    5. Find the expression of at least one of your known genes in anotherpublic ally deposited microarray data set?

    1. Name of data set and how you found it?

    2. What is the largest Fold change observed for this gene in the public study?

    6. Now that you are microarray technology experts can you give me 3reasons why the observed transcript level difference may not beconfirmed through a second technology like RTQPCR?

    Suggested Reading

    http://www.yeastgenome.org/http://www.yeastgenome.org/
  • 8/3/2019 Micro Array Data Analysis 06

    55/55

    Suggested Reading