Expression Profile = the pattern of signal values for one gene
over several chips. Expression Profile Clustering = the clustering
of similar profiles Why? Similar expression profiles suggest
regulation (by shared factor or cluster member) related function
ALSO: opposite expression profiles suggest regulation (e.g.
inhibition) Software: EpClust J-Express Etc. Expression Profile
Clustering Slide 2 Guided Tour of EpClust Slide 3 Data file format
Can enter in a variety of formats. See EpClusts data upload page
One simple way: Download tab-delimited data from NASC Open with (or
paste into) excel Delete all but one name column and Signal value
columns. Do not include any hyphens! (I try to avoid all
punctuation) Save as tab delimited text Annotation can be added as
either: A second file with the same first column of names A single
second column in the signal file (if specified)
Gene_IDChip1Chip2Chip3 AT1G01010404865 AT1G01030241 AT1G01040798835
AT1G01050713744671 AT1G010605680111 AT1G010707510994 AT1G010803933
Slide 4 To Upload Data Slide 5 To select uploaded data Slide 6
Select Specific Experiments within Input File Slide 7 To Remove
Unreliable Data Slide 8 To Choose Algorithm Type Hierarchical
Measures distance between each profile (i.e. gene) with each other
profile So time increases exponentially with each gene Then
clusters closest genes together, followed by increasingly distant
ones Into a tree of clusters within clusters can be visualised and
finally split at chosen distance with knowledge of tree K-means
Initially choose a set number (K) of clusters 1) Chooses the K most
different profiles 2) Clusters each remaining profile with one of
the K So, much less time than Hierarchical (if many genes) Cluster
size can vary so can return too many or few genes with no way to
select. Then must repeat with different K values. Slide 9 So Which
is Better? K-Means is better able to manage large data-sets.
Hierarchical seams a more objective aproach In that you dont need
to decide cluster number at start But which is more biologically
informative? Opinion divided. Both artificial. Dont prove, just
suggest. K-means gives more consistent results. .and then theres
SOTA too! Slide 10 Alternative Distance Measurements Slide 11 The
most commonly used How similar the SHAPES of the two profiles are
Based on average of values and the standard deviation Rates from
identical (1) to completely uncorrelated (0) to perfect opposites
(-1) Centred Test? If profiles have identical shape, but offset
from each other by a fixed value (or magnitude) Centred: Identical
(1) Uncentred: Not (