Introduction to Bioinformatics - Tutorial no. 12 Expression Data Analysis: - Clustering - GEO - EPClust.

Introduction to Bioinformatics - Tutorial no. 12

Expression Data Analysis:- Clustering- GEO- EPClust

Application of Microarrays

We only know the function of about 20% of the 30,000 genes in the Human Genome Gene exploration Faster and better

Applications: Evolution Behavior Cancer Research

Microarray Analysis

Unsupervised Grouping: Clustering

Pattern discovery via grouping similarly expressed genes together

Three techniques most often used k-Means Clustering Hierarchical Clustering Kohonen Self Organizing Feature Maps

Hierarchical Agglomerative ClusteringMichael Eisen, 1998

Cluster (algorithm) TreeView (visualization)

Hierarchical Agglomerative Clustering Step 1: Similarity score between all pairs of genes

Pearson Correlation Euclidean distance

Step 2: Find the two most similar genes, replace with a node that contains the average Builds a tree of genes

Step 3: Repeat

52 41 3

Agglomerative Hierarchical Clustering

Distance between joined clusters

Need to define the distance between thenew cluster and the other clusters.

Single Linkage: distance between closest pair.

Complete Linkage: distance between farthest pair.

Average Linkage: average distance between all pairs

or distance between cluster centers

Need to define the distance between thenew cluster and the other clusters.

Single Linkage: distance between closest pair.

Complete Linkage: distance between farthest pair.

Average Linkage: average distance between all pairs

or distance between cluster centers

Dendrogram

The dendrogram induces a linear ordering of the data points

Results of Clustering Gene Expression

CLUSTER is simple and easy to use

De facto standard for microarray analysis

Limitations: Hierarchical clustering in

general is not robust Genes may belong to

more than one cluster

K-Means Clustering Algorithm Randomly initialize k cluster means Iterate:

Assign each genes to the nearest cluster mean Recompute cluster means

Stop when clustering converges

Notes: Really fast Genes are partitioned into clusters How do we select k?

K-Means Algorithm

Randomly Initialize Clusters

K-Means Algorithm

Assign data points to nearest clusters

K-Means Algorithm

Recalculate Clusters

K-Means Algorithm

Recalculate Clusters

K-Means Algorithm

Repeat

K-Means Algorithm

Repeat

K-Means Algorithm

Repeat … until convergence

EPClust Input (1)Expression data matrix

Extra annotation for gene rows

Method of tabulation

Name for further analysis

EPClust Input (2)

Method of measuring distance between gene rows

Cluster hierarchically

Number k of means

Cluster into k means

GEO: Gene Expression Omnibus

NCBI database for gene expression data Founded at end of 2000

Querying GEOBrowse records

Search for entries containing a gene

Search for experiments

Search with Entrez

SGD – Expression database

http://db.yeastgenome.org/cgi-bin/expression/expressionConnection.pl

SGD – Expression database

Two labs are running experiments on the APO1 gene. Suggest a method that would allow them to compare their results.

Gene grouping Relative values

Explain how microarrays can be used as a basis for diagnostic

Sample 1

Sample 2

Sample 3

sample4

Sample 5

Gen1+--++Gen2++-+-Gen3-+++-Gen4+++--Gen5--+-+

Explain how microarrays can be used as a basis for diagnostic

Sample 1

Sample 2

sample4

Sample 3

Sample 5

Gen1+-+-+Gen2+++--Gen3-+++-Gen4++-+-Gen5---++

Introduction to Bioinformatics - Tutorial no. 12 Expression Data Analysis: - Clustering - GEO - EPClust.

cluster slide

analysis slide

epclust slide

entrez slide

convergence slide

algorithm repeat slide

clustering algorithm

sgd expression database

Documents

Density-based Place Clustering in Geo-Social...

A comparative study of Clustering for Gene expression data.....

Principal component analysis (PCA) for clustering gene...

High Availability with Windows Failover Clustering and Geo.....

How to use GEO Ka-Lok Ng Department of Bioinformatics Asia.....

BEIRA: A geo-semantic clustering method for area summary

Practical...

Statistical Genomics and Bioinformatics Workshop: … ·...

Analysis of GEO datasets using GEO2R Parthav Jailwala CCR...

Clustering in Bioinformatics

Bioinformatics and Functional Genomics wrapup › biol4230.....

Geo Clusteringのクイックスタート - SUSE Linux...

Www.bioalgorithms.infoAn Introduction to Bioinformatics...

Extracting information from European Bioinformatics...

Tutorial 8 Clustering 1. General Methods –Unsupervised...

BINF 636: Lecture 9: Clustering: How Do They Make and...