Richard Jensen and Qiang Shen Prof Qiang Shen Aberystwyth University, UK qqs@aber.ac.uk Dr. Richard Jensen Aberystwyth University, UK rkj@aber.ac.uk Interval-valued.
Post on 22-Dec-2015
222 Views
Preview:
Transcript
Richard Jensen and Qiang ShenRichard Jensen and Qiang Shen
Prof Qiang ShenProf Qiang ShenAberystwyth University, UKAberystwyth University, UK
qqs@aber.ac.ukqqs@aber.ac.uk
Dr. Richard JensenDr. Richard JensenAberystwyth University, UKAberystwyth University, UK
rkj@aber.ac.ukrkj@aber.ac.uk
Interval-valued Fuzzy-Rough Feature Selectionin Datasets with Missing Values
Interval-valued Fuzzy-Rough Feature Selectionin Datasets with Missing Values
FUZZ-IEEE 2009FUZZ-IEEE 2009
Richard Jensen and Qiang ShenRichard Jensen and Qiang Shen
OutlineOutline
• The importance of feature selectionThe importance of feature selection
• Rough set theoryRough set theory
• Fuzzy-rough feature selection (FRFS)Fuzzy-rough feature selection (FRFS)
• Interval-valued FRFSInterval-valued FRFS
• ExperimentationExperimentation
• ConclusionConclusion
Richard Jensen and Qiang ShenRichard Jensen and Qiang Shen
• Why dimensionality reduction/feature selection?Why dimensionality reduction/feature selection?
• Growth of information - need to manage this effectivelyGrowth of information - need to manage this effectively• Curse of dimensionality - a problem for machine learningCurse of dimensionality - a problem for machine learning• Data visualisation - graphing dataData visualisation - graphing data
High dimensionaldata
DimensionalityDimensionalityReductionReduction
Low dimensionaldata
Processing SystemProcessing System
IntractableIntractable
Feature selectionFeature selection
Richard Jensen and Qiang ShenRichard Jensen and Qiang Shen
Feature selectionFeature selection
• Feature selection (FS) is a DR technique that Feature selection (FS) is a DR technique that preserves data semantics (meaning of data)preserves data semantics (meaning of data)
• Subset generationSubset generation: forwards, backwards, random…: forwards, backwards, random…• Evaluation functionEvaluation function: determines ‘goodness’ of subsets: determines ‘goodness’ of subsets• Stopping criterionStopping criterion: decide when to stop subset search: decide when to stop subset search
GenerationGeneration EvaluationEvaluation
StoppingStoppingCriterionCriterion
ValidationValidation
Feature setFeature set SubsetSubset
SubsetSubsetsuitabilitysuitability
ContinueContinue StopStop
Richard Jensen and Qiang ShenRichard Jensen and Qiang Shen
Rough set theoryRough set theory
Rx Rx is the set of all points that are is the set of all points that are indiscernibleindiscernible
with point with point x x in terms of feature subset in terms of feature subset BB
UpperUpperApproximationApproximation
Set ASet A
LowerLowerApproximationApproximation
Equivalence Equivalence class class RxRx
Richard Jensen and Qiang ShenRichard Jensen and Qiang Shen
Rough set feature selectionRough set feature selection
• Attempts to remove unnecessary or Attempts to remove unnecessary or redundant featuresredundant features• EvaluationEvaluation: function based on rough set : function based on rough set
concept of lower approximationconcept of lower approximation
• GenerationGeneration: greedy hill-climbing algorithm : greedy hill-climbing algorithm employedemployed
• Stopping criterionStopping criterion: when maximum evaluation : when maximum evaluation value is reachedvalue is reached
Richard Jensen and Qiang ShenRichard Jensen and Qiang Shen7
Fuzzy-rough setsFuzzy-rough sets
Fuzzy-rough setFuzzy-rough set
Fuzzy similarityFuzzy similarity
Richard Jensen and Qiang ShenRichard Jensen and Qiang Shen
Fuzzy-rough setsFuzzy-rough sets
• Fuzzy-rough feature selectionFuzzy-rough feature selection• EvaluationEvaluation: function based on fuzzy-rough lower : function based on fuzzy-rough lower
approximationapproximation
• GenerationGeneration: greedy hill-climbing: greedy hill-climbing
• Stopping criterionStopping criterion: when maximal ‘goodness’ is : when maximal ‘goodness’ is reached (or to degree reached (or to degree αα))
• Problem #1Problem #1: : how to choose fuzzy similarity?how to choose fuzzy similarity?
• Problem #2Problem #2: : how to handle missing values?how to handle missing values?
Richard Jensen and Qiang ShenRichard Jensen and Qiang Shen
Interval-valued FRFSInterval-valued FRFS
IV fuzzy rough setIV fuzzy rough set
IV fuzzy similarityIV fuzzy similarity
• Answer #1Answer #1: Model uncertainty in fuzzy : Model uncertainty in fuzzy similarity by interval-valued similaritysimilarity by interval-valued similarity
Richard Jensen and Qiang ShenRichard Jensen and Qiang Shen
Interval-valued FRFSInterval-valued FRFS
Missing valuesMissing values
• When comparing two object values for a When comparing two object values for a given attribute – what to do if at least one is given attribute – what to do if at least one is missing?missing?
• Answer #2Answer #2: Model missing values via the : Model missing values via the unit intervalunit interval
Richard Jensen and Qiang ShenRichard Jensen and Qiang Shen
Other measuresOther measures
• Boundary regionBoundary region
• Discernibility functionDiscernibility function
Richard Jensen and Qiang ShenRichard Jensen and Qiang Shen
ExperimentationExperimentation
• Datasets corrupted with noiseDatasets corrupted with noise
• 10-fold cross validation with JRip10-fold cross validation with JRip
Richard Jensen and Qiang ShenRichard Jensen and Qiang Shen
Results: discernibilityResults: discernibility
Richard Jensen and Qiang ShenRichard Jensen and Qiang Shen
ConclusionConclusion
• New approaches to fuzzy-rough feature New approaches to fuzzy-rough feature selection based on IVFSselection based on IVFS• Can handle missing values effectivelyCan handle missing values effectively
• Allows greater flexibility w.r.t. similarity relations Allows greater flexibility w.r.t. similarity relations
• Future workFuture work• Further investigationsFurther investigations
• Development and extension of other fuzzy-rough Development and extension of other fuzzy-rough methods to handle missing values – classifiers, methods to handle missing values – classifiers, clusterers etc.clusterers etc.
Richard Jensen and Qiang ShenRichard Jensen and Qiang Shen
• WEKA implementations of all fuzzy-rough WEKA implementations of all fuzzy-rough feature selectors and classifiers can be feature selectors and classifiers can be downloaded from:downloaded from:
Richard Jensen and Qiang ShenRichard Jensen and Qiang Shen
RSAR approximationsRSAR approximations
• Approximating a concept Approximating a concept XX using knowledge in using knowledge in PP• Lower approximation: contains objects that Lower approximation: contains objects that definitelydefinitely
belong to belong to XX
• Upper approximation: contains objects that Upper approximation: contains objects that possiblypossibly belong to belong to XX
}][:{ XxUxXPP
}][:{ XxUxXPP
top related