Top Banner
Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg Data Mining Institute University of Wisconsin - Madison
26

Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Dec 18, 2015

Download

Documents

Shon Warren
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Survival-Time Classification of Breast Cancer Patients

and ChemotherapyISMP-2003

Copenhagen August 18-22, 2003

Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg

Data Mining Institute

University of Wisconsin - Madison

Page 2: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Breast Cancer Estimates American Cancer Society & World Health Organization

Breast cancer is the most common cancer among women in the United States. 212,600 new cases of breast cancer will be diagnosed in the United States in 2003: 211,300 in women, 1,300 in men 40,200 deaths will occur from breast cancer in the United States in 2003: 39,800 in women, 400 in menWHO estimates: More than 1.2 million people worldwide were diagnosed with breast cancer in 2001 and 0.5 million died from breast cancer in 2000.

Page 3: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Key Objective

Identify breast cancer patients for whom chemotherapy prolongs survival time Main Difficulty: Cannot carry out comparative

tests on human subjects Similar patients must be treated similarly Our Approach: Classify patients into:

Good, Intermediate & Poor groups such that: Good group does not need chemotherapy Intermediate group benefits from chemotherapy Poor group not likely to benefit from chemotherapy

Page 4: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Outline

Tools used Support vector machines (Linear & Nonlinear SVMs)

Feature selection & classification Clustering (k-Median algorithm not k-Means)

Cluster into chemo & no-chemo groups Cluster chemo patients into 2 groups: good & poor Cluster no-chemo patients into 2 groups: good & poor Merge into three final classes

Good (No-chemo) Poor (Chemo) Intermediate : Remaining patients (chemo & no-chemo)

Generate survival curves for three classes Use SSVM to classify new patients into one of above three classes

Data description

Page 5: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Cell Nuclei of a Fine Needle Aspirate

Page 6: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Thirty Cytological FeaturesCollected at Diagnosis Time

Page 7: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Two Histological Features Collected at Surgery Time

Page 8: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Features Selected by Support Vector Machine

Page 9: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

1- Norm Support Vector MachinesMaximize the Margin between Bounding Planes

x0w= í +1

x0w= í à 1

A+

A-

jjwjj12

w

Page 10: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Support Vector MachineAlgebra of 2-Category Linearly Separable Case

Given m points in n dimensional space Represented by an m-by-n matrix A Membership of each in class +1 or –1 specified by:A i

An m-by-m diagonal matrix D with +1 & -1 entries

D(Awà eí )=e;

More succinctly:

where e is a vector of ones.

x0w= í æ1: Separate by two bounding planes,

A iw=í +1; for D i i =+1;A iw5í à 1; for D i i = à 1:

Page 11: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Feature SelectionUsing 1-Norm Linear SVM

Classification Based on Lymph Node Status

Features selected: 6 out of 31 by above SVM:

Feature selection: 1-norm SVM: SVM jjájj1

s. t.

÷e0y+kwk1

D(Awà eí ) +y> e

y> 0;w;ímin

,

, denotes Lymph node > 0 or where D ii =æ1Lymph node =0

5 out 30 cytological features that describe nuclear size, shape and texture from fine needle aspirate

Tumor size from surgery

Page 12: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Nonlinear SVM for Classifying New Patients

Linear SVM: (Linear separating surface: x0w= í )

(LP)÷e0y+kwk1y> 0;w;í

D(Awà eí ) +y> e

min

s.t.

y>0;u; í

K (A;A0) Replace AA0 by a nonlinear kernel :÷e0y+kuk1

D(K (A;A0)Duà eí ) + y>e

min

s.t.

in the “dual space” , gives:

By QP duality: w= A0Du. Maximizing the margin

÷e0y+kuk1y>0;u; í

D(AA0Duà eí ) + y>e

min

s.t.

Page 13: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

The Nonlinear Classifier

K (A;A0) : Rmân â Rnâm7à! Rmâm

K (x0;A0)Du = í

The nonlinear classifier:

Where K is a nonlinear kernel, e.g.: Gaussian (Radial Basis) Kernel :

"àökA iàA jk22; i; j = 1;. . .;mK (A;A0)ij =

The ij -entry of K (A;A0) represents “similarity” between the data points A i A jand

Page 14: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Clustering in Data Mining

General Objective

Given: A dataset of m points in n-dimensional real space

Problem: Extract hidden distinct properties by clustering the dataset into k clusters

Page 15: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Concave Minimization Formulationof 1-Norm Clustering Problem (k-Median)

, and a numberA 2 Rmân

Given: Set A of m points in Rn represented by the matrixk of desired clusters

k Objective Function: Sum of m minima of linear functions,hence it is piecewise-linear concave

Difficulty: Minimizing a general piecewise-linear concavefunction over a polyhedral set is NP-hard

C1;C2; . . .;CkFind: Cluster centers that minimizethe sum of 1-norm distances of each point: A1;A2; . . .;Am; to its closest cluster center.

Page 16: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Clustering via Finite Concave Minimization

Equivalent bilinear reformulation:

i = 1;. . .;m; ` = 1;. . .;k

C`;D i ` 2 R n;Ti ` 2 R

P

i=1

m P

`=1

kTi`e0D i`

à D i` ô A0i à C` ô D i`

P`=1k Ti`=1; Ti` õ 0

min

s.t.

à D i` ô A0i à C` ô D i`

i = 1;. . .;m;` = 1;. . .;k

C`;D i `

P

i=1

m

` = 1; . . .; kf e0D i `gmin min

s.t.

Minimize the sum of 1-norm distances between each dataA ipoint C` :and the closest cluster center

Page 17: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

K-Median Clustering AlgorithmFinite Termination at Local Solution

Step 1 (Cluster Assignment): Assign points to the cluster withthe nearest cluster center in 1-norm

Step 2 (Center Update) Recompute location of center for eachcluster as the cluster median (closest point to all clusterpoints in 1-norm)

Step3 (Stopping Criterion) Stop if the cluster centers are unchanged, else go to Step 1

=Step 0 (Initialization): Pick 2 initial cluster centers

(L=0 & T<2) & (L 5 or T 4)=

Page 18: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Feature Selection & Initial Cluster Centers

6 out of 31 features selected by 1-norm SVM ( SVM jjájj1) SVM separating lymph node positive (Lymph > 0)

from lymph node negative (Lymph = 0)

Perform k-Median algorithm in 6-dimensional input space

Initial cluster centers used: Medians of Good1 & Poor1

Good1: Patients with Lymph = 0 AND Tumor < 2

Poor1: Patients with Lymph > 4 OR Tumor õ 4 Typical indicator for chemotherapy

Page 19: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Overall Clustering Process

253 Patients(113 NoChemo, 140 Chemo)

Cluster 113 NoChemo PatientsUse k-Median Algorithm with Initial Centers:

Medians of Good1 & Poor1

69 NoChemo Good 44 NoChemo Poor 67 Chemo Good 73 Chemo Poor

Good PoorIntermediate

Cluster 140 Chemo PatientsUse k-Median Algorithm with Initial Centers:

Medians of Good1 & Poor1

Good1:Lymph=0 AND Tumor<2

Compute Median Using 6 Features

Poor1:Lymph>=5 OR Tumor>=4

Compute Median Using 6 Features

Compute InitialCluster Centers

Page 20: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Survival Curves forGood, Intermediate & Poor Groups

(Classified by Nonlinear SSVM)

Page 21: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Survival Curves for Intermediate Group:Split by Chemo & NoChemo

Page 22: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Survival Curves for Overall Patients:With & Without Chemotherapy

Page 23: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Survival Curves for Intermediate GroupSplit by Lymph Node & Chemotherapy

Page 24: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Survival Curves for Overall PatientsSplit by Lymph Node Positive & Negative

Page 25: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Conclusion

Used five cytological features & tumor size to clusterbreast cancer patients into 3 groups: Good – No chemotherapy recommended Intermediate – Chemotherapy likely to prolong survival Poor – Chemotherapy may or may not enhance survival

3 groups have very distinct survival curves First categorization of a breast cancer group for which

chemotherapy enhances longevity

SVM- based procedure assigns new patients into one of above three survival groups

Page 26: Survival-Time Classification of Breast Cancer Patients and Chemotherapy ISMP-2003 Copenhagen August 18-22, 2003 Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg.

Talk & Paper Available on Web

www.cs.wisc.edu/~olvi

Y.-J. Lee, O. L. Mangasarian & W. H. Wolberg: “Computational Optimization and Applications” Volume 25, 2003, pages 151-166”