Top Banner
A contribution to ranking and description of classifications PhD examination Germ´ an S´ anchez-Hern´ andez Dept. of Enginyeria de Sistemes, Autom` atica i Inform` atica Industrial (ESAII) Universitat Polit` ecnica de Catalunya – BarcelonaTech (UPC) Av. Diagonal 647, 08034 Barcelona ESADE Business School, Universitat Ramon Llull (URL) Avda. Torreblanca 59, 08172 Sant Cugat del Vall` es [email protected] Co-Advisors: Juan Carlos Aguado Chao & N´ uria Agell Jan´ e September 13th, 2013
51

A contribution to ranking and description of classifications

Apr 12, 2017

Download

Data & Analytics

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A contribution to ranking and description of classifications

A contribution to ranking and description ofclassificationsPhD examination

German Sanchez-Hernandez

Dept. of Enginyeria de Sistemes, Automatica i Informatica Industrial (ESAII)Universitat Politecnica de Catalunya – BarcelonaTech (UPC)

Av. Diagonal 647, 08034 Barcelona

ESADE Business School, Universitat Ramon Llull (URL)Avda. Torreblanca 59, 08172 Sant Cugat del Valles

[email protected]

Co-Advisors: Juan Carlos Aguado Chao & Nuria Agell Jane

September 13th, 2013

Page 2: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Outline

1 Introduction

2 Literature review

3 Fuzzy criteria for selecting classifications

4 NL-based automatic qualitative description of clusters

5 Application to market segmentation

6 Conclusions

2 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 3: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Motivation and frameworkObjectivesTheoretical background

Outline

1 Introduction

2 Literature review

3 Fuzzy criteria for selecting classifications

4 NL-based automatic qualitative description of clusters

5 Application to market segmentation

6 Conclusions

3 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 4: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Motivation and frameworkObjectivesTheoretical background

Motivation and framework

Motivation in Marketing for developing a novel and complete fuzzyMCDM1 methodology:

1. Generation of segmentations

Unsupervised learning.

2. Rank and selection of segmentations

Criteria to assess segmentations.Aggregation of the assessments.

3. Description of the best segmentation

Most important featuresNatural language

1Multi-Criteria Decision Making4 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 5: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Motivation and frameworkObjectivesTheoretical background

Objectives

1 Generation of classifications (Theoretical background).

2 Evaluation by fuzzy validation criteria (Chapter 3).

3 Rank by aggregating the assessments (Chapter 2).

4 NLG2 system (Chapter 4).

5 Application (Chapter 5).

2Natural Language Generation5 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 6: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Motivation and frameworkObjectivesTheoretical background

Overview

6 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 7: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Criteria for selecting classificationsAggregation functionsData-to-text systems

Outline

1 Introduction

2 Literature review

3 Fuzzy criteria for selecting classifications

4 NL-based automatic qualitative description of clusters

5 Application to market segmentation

6 Conclusions

7 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 8: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Criteria for selecting classificationsAggregation functionsData-to-text systems

Criteria for selecting classifications

Alternatives for selecting classifications:

[Kukar, 2003; Osei-Bryson, 2010], classical sequential approach.

[Halkidi et al., 2002] translates the problem into a search one.

[Broder et al., 2008] new classification by voting.

Types of validation criteria [Jain et al., 1999; Theodoridis and Koutroumbas,2008]

Internal criteria: analyse internal structureCompactness, separability, prediction strength... [Liu et al., 2010].

External criteria: compare with external structureRand, Jaccard, Fowlkes-Mallows, Minkowski indexes... [Wu et al., 2009].

Relative criteria: manual pairwise comparisons (avoided)[Jain et al., 1999].

8 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 9: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Criteria for selecting classificationsAggregation functionsData-to-text systems

Summary Validation Criteria

Paper CommentsInternal criteria Extern. Relat.

Compact. Separab. Accuracy Feat. Goals criteria criteria

Ramze Rezaeeet al. (1998)

One indexfor fuzzyc-Means

Yes:compact.

Yes:separat.

No No No No No

Cheng et al.(1999)

Subspaceclustering

Yes: highdensity

No No NoYes: cover.& correl.of dim’s

No No

Kukar(2003)

Reliabilityon diagn.

No NoYes:

reliabilityNo No No No

Halkidi et al.(2001)

Review Yes: several No No NoYes:

severalYes:

severalChoi et al.(2005)

Associationrules

No No No NoYes:

R-F-MVNo No

Tibshirani& Walther(2005)

Valid. bypredictionstrength

Yes: variance Yes: biasYes:

predictionstrength

No No No No

Yatskiv &Gusarova(2005)

Review No No No No NoYes:

severalYes:

several

Methodpresented

Review &application

Yes: I

C

(coherence)Yes: I

A

(accuracy)No

Yes: I

U

&I

B

(useful.& balanced)

Yes: I

D

(dep.)No

9 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 10: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Criteria for selecting classificationsAggregation functionsData-to-text systems

Summary Validation Criteria

Paper CommentsInternal criteria Extern. Relat.

Compact. Separab. Accuracy Feat. Goals criteria criteria

Bittmann& Gelbard(2009)

Visualisationof hierarch.clustering

Yes: minhetereog.

No No NoYes:

visualis.No No

Wang et al.(2009)

Clinicalapplication

Yes: Davies-Bouldin& rel.-free

No No No No No

Wu et al.(2009)

Externalcriteria fork-Means

No No No No NoYes:

severalNo

Xiong et al.(2009)

k-MeansYes:

Sum ofSq. Errors

Yes:entropyand CV

Yes:F -measure

No No No No

Liu et al.(2010)

Internalcriteriareview

Yes:several

Yes:several

No No No No No

Osei-Bryson(2010)

Review No NoYes:

accuracy

Yes: # ofimp.vars

Yes:outliers,Max/Min

NoYes:

several

Methodpresented

Review &application

Yes: I

C

(coherence)Yes: I

A

(accuracy)No

Yes: I

U

&I

B

(useful.& balanc.)

Yes: I

D

(dep.)No

10 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 11: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Criteria for selecting classificationsAggregation functionsData-to-text systems

Aggregation functions

Two steps in MCDM [Fodor and Roubens, 1994]:

1 Aggregation of the single evaluations.

2 Exploitation by generating a ranking of the alternatives.Many di↵erent families of aggregation functions [Chiclana et al., 2004,2007; Dubois and Prade, 1985; Fodor and Roubens, 1994; Herrera et al., 2003;Klir and Folger, 1988; Torra, 1997; Torra and Narukawa, 2007; Xu and Da,2003; Yager, 1988; Zhou et al., 2008].

OWA operator [Yager, 1988]

�W

(a1, · · · , an) =nX

i=1

w

i

a�(i)

w

i

via linguistic quantifier [Zadeh, 1983]:w

i

= Q

�i

n

�� Q

�i�1n

�,

Q is a RIM quantifier: Q(r) = r

↵.

11 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 12: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Criteria for selecting classificationsAggregation functionsData-to-text systems

Data-to-text Systems

Paper System Application area Input data Users Rules

Goldberg et al. (1994) FoG

Weather forecastingTime series Forecasters Yes

Reiter et al. (2005) Forecasting texts Time series Forecasters NoSripada et al. (2003) SumTime-Mousam Time series Forecasters YesCawsey et al. (2000) Piglit

Medicine

Events Patients Yes

Hallett and Scott (2005) Summaries of events List of eventsMedical sta↵& patients

Yes

Harris (2008)Narrative Engine:text summaries

Events Medical sta↵ No

Huske-Kraus (2003b) Review of applications Raw data Medical sta↵ No

Huske-Kraus (2003a)Suregen-2 :

Routine reportsMedical sta↵ No

Kahn et al. (1991) Topaz Yes

Portet et al. (2009)BT-45 :

Neonatal summariesRaw data

from sensorsMedical sta↵& patients

Yes

Reiter et al. (2003)Stop:

personalised reportsManual input Patients Yes

Method presented (2013) Description of groups Generic Tabular data Generic Yes

12 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 13: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Criteria for selecting classificationsAggregation functionsData-to-text systems

Summary Data-to-text systems

Paper System Application area Input data Users Rules

Ferres et al. (2006) iGraph

AccessibilityGraphical data Visually-imp. Yes

Thomas & Sripada (2008) Atlas.txt

Geo-referenceddata

Visually-imp. No

Kukich (1983)Ana: textualstock market

Financial Time series Stock marketers Yes

Hammond and Davis (2005) Ladder

ImageSketches Yes

Herzog and Wazinski (1994) Vitru Visual scenes NoRoy (2002) Describer Visual scenes YesSripada and Gao (2007) ScubaText Sports Scuba divers NoIordanskaja et al. (1992) Summaries

GenericStatistical data Yes

McKeown et al. (1994) PLANDoc List of events No

Method presented (2013)Description of

groupsGeneric Tabular data Generic Yes

13 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 14: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

MotivationFirst criterion: useful number of classesSecond criterion: balanced classesThird criterion: coherent classificationFourth criterion: dependency on external variablesFifth criterion: accuracy of the predictive model

Outline

1 Introduction

2 Literature review

3 Fuzzy criteria for selecting classifications

4 NL-based automatic qualitative description of clusters

5 Application to market segmentation

6 Conclusions

14 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 15: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

MotivationFirst criterion: useful number of classesSecond criterion: balanced classesThird criterion: coherent classificationFourth criterion: dependency on external variablesFifth criterion: accuracy of the predictive model

Motivation

15 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 16: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

MotivationFirst criterion: useful number of classesSecond criterion: balanced classesThird criterion: coherent classificationFourth criterion: dependency on external variablesFifth criterion: accuracy of the predictive model

First criterion: useful number of classes

Objective: su�cient but small enough number of clusters.

Definition

Given a classification C, the index of usefulness is characterised by thefollowing membership function:

I

U,K1,K2(C) =

8<

:

f1(M), if 1 M < K1;1, if K1 M K2;f2(M), if K2 < M N,

(1)

where M 2 N is the number of classes of C; K1,K2 2 N such that K1 < K2 aretwo prefixed parameters; and f1 is a strict increasing function and f2 is a strictdecreasing function verifying f1(1) = f2(N) = 0 and f1(K1) = f2(K2) = 1.

16 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 17: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

MotivationFirst criterion: useful number of classesSecond criterion: balanced classesThird criterion: coherent classificationFourth criterion: dependency on external variablesFifth criterion: accuracy of the predictive model

First criterion: examples

Examples of functions for K1 = 4 and K2 = 7:

17 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 18: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

MotivationFirst criterion: useful number of classesSecond criterion: balanced classesThird criterion: coherent classificationFourth criterion: dependency on external variablesFifth criterion: accuracy of the predictive model

Second criterion: balanced classes

Objective: to avoid (or boost) unbalanced classifications.

Definition

Given a classification C, the index of balanced classes of C is defined as:

I

B

(C) = max

Y

(CVY

)� CVC

max

Y

(CVY

)�min

Y

(CVY

), (2)

where CVC is the coe�cient of variation associated with C and min

Y

(CVY

) andmax

Y

(CVY

) are given as per Propositions 3.1 and 3.3, respectively.

If unbalanced classes are required,

I

B

= 1� I

B

.

18 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 19: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

MotivationFirst criterion: useful number of classesSecond criterion: balanced classesThird criterion: coherent classificationFourth criterion: dependency on external variablesFifth criterion: accuracy of the predictive model

Second criterion: example

Example of IB

for two classifications:

Let C1 and C2 be the following two di↵erent classifications of the same data setconsisting of N = 260 individuals:

Y1 = {90, 80, 90};Y2 = {110, 30, 20, 90}

CV

Y1 = 0.51;CVY2 = 4.43.

min(CVY

) = 0;max(CVY

) = 6.18.

I

B

(C1) = 0.92; IB

(C2) = 0.28.

19 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 20: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

MotivationFirst criterion: useful number of classesSecond criterion: balanced classesThird criterion: coherent classificationFourth criterion: dependency on external variablesFifth criterion: accuracy of the predictive model

Third criterion: coherent classification

Objective: to ensure that the Global Adequacy Degrees (GAD)are obtained from similar values of Marginal Adequacy Degrees(MAD).

Definition

The index of coherence of classification is given as follows:

I

C

(C) = 1�P

M

i=1

PN

j=1[max(µijk

)�min(µijk

)]

M · N , (3)

where N 2 N is the number of individuals, M 2 N is the number of classes ofclassification C, and µ

ijk

is the MAD of individual j to class i according todescriptor k.

20 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 21: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

MotivationFirst criterion: useful number of classesSecond criterion: balanced classesThird criterion: coherent classificationFourth criterion: dependency on external variablesFifth criterion: accuracy of the predictive model

Third criterion: example

Examples of IC

for two classifications

Let’s consider two classifications C1 and C2 consisting of two (A and B) andthree classes (C, D and E), respectively. MADs are shown next:

Class A d1 d2 d3i1 0.3 0.4 0.6i2 0.2 0.3 0.2i3 0.5 0.5 0.3

Class B d1 d2 d3i1 0.7 0.5 0.5i2 0.4 0.8 0.5i3 0.9 0.6 0.7

Class C d1 d2 d3i1 0.1 0.6 0.4i2 0.7 0.2 0.3i3 0.4 0.8 0.3

Class D d1 d2 d3i1 0.1 0.7 0.3i2 0.5 0.2 0.7i3 0.9 0.3 0.4

Class E d1 d2 d3i1 0.9 0.2 0.7i2 0.6 0.9 0.6i3 0.7 0.6 0.1

MADs of C1 more homogeneous ! I

C

(C1) = 0.75 > I

C

(C2) = 0.47

21 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 22: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

MotivationFirst criterion: useful number of classesSecond criterion: balanced classesThird criterion: coherent classificationFourth criterion: dependency on external variablesFifth criterion: accuracy of the predictive model

Fourth criterion: dependency on external variables

Objective: high relation with an external variable.

Definition

Given a classification C, its index of dependency on a control variable isdefined as:

I

D

(C) = �2

N ·pM � 1 ·

pS � 1

, (4)

where N is the number of individuals, M is the number of classes of C and S isthe number of unique values of the control variable, if it is qualitative, or thenumber of considered intervals in the discretisation if it is quantitative.

22 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 23: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

MotivationFirst criterion: useful number of classesSecond criterion: balanced classesThird criterion: coherent classificationFourth criterion: dependency on external variablesFifth criterion: accuracy of the predictive model

Fifth criterion: accuracy of the predictive model

Objective: high predictability.

Definition

Given a classification C, its index of accuracy is defined as:

I

A

(C) = 2 · precision(C) · recall(C)precision(C) + recall(C) , (5)

where precision(C) and recall(C) are the weighted averages of precision andrecall of classes of C, respectively.

23 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 24: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

MotivationArquitectureSignal AnalysisData InterpretationDocument PlanningMicroplanning and Realisation

Outline

1 Introduction

2 Literature review

3 Fuzzy criteria for selecting classifications

4 NL-based automatic qualitative description of clusters

5 Application to market segmentation

6 Conclusions

24 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 25: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

MotivationArquitectureSignal AnalysisData InterpretationDocument PlanningMicroplanning and Realisation

Motivation

Interpretation thwrough a natural description.NLG system based on a four-stage architecture [Reiter, 2007]:

1 Signal analysis: basic patterns.

2 Data interpretation: messages and relations.

3 Document planning: messages selection.

4 Microplanning and realisation: text generation.

25 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 26: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

MotivationArquitectureSignal AnalysisData InterpretationDocument PlanningMicroplanning and Realisation

Arquitecture

26 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 27: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

MotivationArquitectureSignal AnalysisData InterpretationDocument PlanningMicroplanning and Realisation

1. Signal Analysis

Objectives: to select variables and to identify the importantvalues.

27 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 28: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

MotivationArquitectureSignal AnalysisData InterpretationDocument PlanningMicroplanning and Realisation

2. Data Interpretation

Objective: to avoid redundant information (type-A rules).

1 Discarding negative messages.

2 Discarding messages obtainedfrom VoIs.

3 Discarding modalities.

4 Discarding variables (samesign).

5 Discarding variables (di↵erentsign).

Rule Class Variable Modality Type SignA.1 Yes Yes YesA.2 Yes Yes Yes YesA.3 Yes YesA.4 Yes Yes Yes YesA.5 Yes Yes Yes

28 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 29: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

MotivationArquitectureSignal AnalysisData InterpretationDocument PlanningMicroplanning and Realisation

3. Document Planning

Objectives: to identify relations and modifications.

Type-B rules:

1 Merging modalities of anordinal variable.

2 Merging modalities.

3 Merging variables with thesame modalities.

4 Adding single modalities.

5 Merging single messages.

Type-C rules:

1 Use of the semantics of modality“no”.

2 Use of the semantics of modality“yes”.

3 Use of the semantics of variables.

4 Modalities as adjectives.

5 Use of linguistic quantifiers.

29 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 30: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

MotivationArquitectureSignal AnalysisData InterpretationDocument PlanningMicroplanning and Realisation

4. Microplanning and Realisation

Objectives: final structure and transcription.

1 Microplanning: sorts groups of by importance.

2 Realisation: transcription.

30 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 31: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Case PresentationDatasetObtaining segmentationsRanking and selecting segmentationsQualitative description

Outline

1 Introduction

2 Literature review

3 Fuzzy criteria for selecting classifications

4 NL-based automatic qualitative description of clusters

5 Application to market segmentation

6 Conclusions

31 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 32: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Case PresentationDatasetObtaining segmentationsRanking and selecting segmentationsQualitative description

Case Presentation

Objective: to apply the methodology reviewed and developed intoa real marketing problem of a B2B environment.

Challenge: comprehendfluctuations in orders made bythe shops (limited resources).

Objective: to segment the setof retailers.

Actions:

1 Unsupervised learning process.

2 Selection of segmentations.

3 Natural language description.

32 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 33: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Case PresentationDatasetObtaining segmentationsRanking and selecting segmentationsQualitative description

Dataset

Grifone:

Outdoor sportingequipment firmFirm: Textil Seu, SA.La Seu d’Urgell(north of Lleida).More than 25 years.

260 points of sale.

16 variables.

Antiquity

Assistants

Evaluation

DisplayGrifone

Location

Internet

ThermalExhibitor

Specialists

Aesthetics

Communication

Competition

DisplaySize

GrifoneWeight

Maintenance

PromosSensit

Size

33 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 34: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Case PresentationDatasetObtaining segmentationsRanking and selecting segmentationsQualitative description

Obtaining segmentations

Unsupervised learning technique.LAMDA (Learning Algorithm for Multivariate Data Analysis)[Aguado, 1998; Aguado et al., 1999; Aguilar and Lopez de Mantaras, 1982].

Tolerance.

Hybrid connectives (MinMax, Probabilistic Product, Lukasiewicz, Frankn-norms).

Density functions.

PromosSensit as external variable.

! 566 segmentations (between 1 and 244 classes).

34 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 35: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Case PresentationDatasetObtaining segmentationsRanking and selecting segmentationsQualitative description

Criteria

1 Usefulness:

K1 = 3 and K2 = 5.

Linear-exponential: !f1(M) = M�1

3�1 ; f2(M) = e

260�M�1e

(260�5)�1

2 Balanced:

Avoid unbalanced classifications

! I

B

(C) = max

Y

(CVY

)�CVCmax

Y

(CVY

)�min

Y

(CVY

) .

35 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 36: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Case PresentationDatasetObtaining segmentationsRanking and selecting segmentationsQualitative description

Criteria

3 Coherence:Density functions of quantitative variables:

Antiquity : Waissman.

Assistants: Classical.

Evaluation: Gaussian.

4 Dependency:Chosen variable: PromosSensit.

5 Accuracy:SVM, 10-fold cross-validation (30 times)

36 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 37: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Case PresentationDatasetObtaining segmentationsRanking and selecting segmentationsQualitative description

Aggregation of the indexes

OWA operator.

Linguistic quantifier: most of.

RIM function: Q(r) = r

1/2.

Computed weights:

(0.447, 0.185, 0.142, 0.120, 0.106).

37 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 38: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Case PresentationDatasetObtaining segmentationsRanking and selecting segmentationsQualitative description

Results

Rank ID Conn. Toler.Classes

M

Criterion assessmentOWA

I

U

I

B

I

C

I

D

I

A

1 #259 Minmax 0.439 3 1 0.928 0.251 0.528 0.936 0.84232 #260 Minmax 0.469 3 1 0.929 0.254 0.363 0.922 0.82073 #258 Minmax 0.422 4 1 0.885 0.226 0.425 0.875 0.81034 #243 Minmax 0.290 3 1 0.909 0.256 0.008 0.965 0.78685 #244 Minmax 0.304 3 1 0.920 0.255 0.012 0.948 0.78556 #257 Minmax 0.411 3 1 0.933 0.279 0.032 0.856 0.77867 #256 Minmax 0.400 4 1 0.872 0.246 0.064 0.906 0.77518 #253 Minmax 0.359 4 1 0.883 0.231 0.021 0.915 0.77229 #252 Minmax 0.352 4 1 0.884 0.237 0.025 0.904 0.771210 #255 Minmax 0.393 4 1 0.939 0.263 0.063 0.768 0.7686

Segmentation #259:1 Class 1: 35 shops

2 Class 2: 98 shops

3 Class 3: 127 shops

38 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 39: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Case PresentationDatasetObtaining segmentationsRanking and selecting segmentationsQualitative description

1. Signal Analysis

Objectives: to select variables and to identify important values.

Variables Antiquity, Assistants and Evaluation discretised(CAIM [Kurgan and Cios, 2004]).

VariableIntervals

P1 P2 P3

Antiquity (years) less than two three more than threeAssistants few many a lot ofEvaluation bad good excellent

Variables Antiquity, Specialists and Internet discarded.

39 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 40: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Case PresentationDatasetObtaining segmentationsRanking and selecting segmentationsQualitative description

1. Signal Analysis

Detection of relevant VoIs and EFs.

Example: tables for variable Competition

Contingency table:

Competition No Weak Strong

1 0 13 212 26 38 343 24 42 60

Expected frequencies:

Competition No Weak Strong

1 6.6 12.3 15.22 19.0 35.3 43.73 24.4 45.4 56.2

Values of importance:

Competition No Weak Strong

1 -6.66 0.05 2.252 2.59 0.20 -2.153 -0.01 -0.26 0.26

Conditional frequencies:

Competition No Weak Strong

1 0.00 0.38 0.622 0.27 0.39 0.353 0.19 0.33 0.48

40 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 41: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Case PresentationDatasetObtaining segmentationsRanking and selecting segmentationsQualitative description

1. Signal Analysis

Landmarks.

Example: initial messages for variable Competition

ID Class Variable Modality Type Sign Relev. Value

#1 1 Competition no VoI neg. high 6.6#30 1 Competition strong VoI pos. normal 2.3#47 2 Competition no VoI pos. normal 2.6#48 2 Competition strong VoI neg. normal 2.1#68 1 Competition no EF neg. - 0.0

#48: The prop. of shops with mod. “strong” in var. “Competition” is low in Class 2.

#68: None of shops have modality “no” in variable “Competition” in Class 1.

! 67 relevant VoIs (38 highly rel.) and 13 EFs (3 pos., 10 neg.).

41 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 42: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Case PresentationDatasetObtaining segmentationsRanking and selecting segmentationsQualitative description

2. Data Interpretation

Objective: to discard redundant messages (type-A rules).

Example: rule A.2 detecting redundancy between messages #68 and #1 invariable Competition

ID Class Variable Modality Type Sign Relev. Value

#1 1 Competition no VoI neg. high 6.6#68 1 Competition no EF neg. - 0.0

#1: The prop. of shops with mod. “no” in var. “Competition” is very low in Class 1.

#68: None of shops have modality “no” in variable “Competition” in Class 1.

! discarding message #1 and prioritising #68.

! 28 redundant messages were discarded.

42 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 43: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Case PresentationDatasetObtaining segmentationsRanking and selecting segmentationsQualitative description

3. Document Planning

Objectives: to merge related messages (type-B rules) and to“naturalise” specific messages (type-C rules).

Example: rule B.2 detecting relations between messages #30 and #68 invariable Competition

ID Class Variable Modality Type Sign Relev. Value

#30 1 Competition strong VoI pos. normal 2.3#68 1 Competition no EF neg. - 0.0

#30: The prop. of shops with mod. “strong” in var. “Competition” is high in Class 1.

#68: None of shops have modality “no” in variable “Competition” in Class 1.

! the merge of messages #30 and #68 will be done in stage 4.

! 45 out of 52 messages activated type-B rules.

43 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 44: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Case PresentationDatasetObtaining segmentationsRanking and selecting segmentationsQualitative description

3. Document Planning

Objectives: to merge related messages (type-B rules) and to“naturalise” specific messages (type-C rules).

Example: rule C.3 a↵ecting messages of variable Competition

ID Class Variable Modality Type Sign Relev. Value

#30 1 Competition strong VoI pos. normal 2.3#68 1 Competition no EF neg. - 0.0

#30: The prop. of shops with mod. “strong” in var. “Competition” is low in Class 1.

! #30-naturalised: The prop. of shops with a strong competition is low in Class 1.

#68: None of shops have Competition “no” in Class 1.

! #68-naturalised: All shops have competition in Class 1.

44 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 45: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Case PresentationDatasetObtaining segmentationsRanking and selecting segmentationsQualitative description

4. Microplanning and Realisation

Objectives: final structure and transcription.! 29 groups of messages (sentences), between 1 and 3 messages.

Example: construction of a sentence for variable Maintenance

Group #4 consisting of three messages a↵ected by rules B.2, B.4 and C.4.

Standard:#37: The prop. of shops with modality “good” in variable “maintenance” is high.#70: None of shops has mod. “deficient” in var. “maintenance”.#36: The prop. of shops with modality “regular” in variable “maintenance” is low.

Rule C.4:#37: The proportion of shops with a good maintenance is high.#70: None of shops has a deficient maintenance.#36: The proportion of shops with a regular maintenance is low.

Rule B.2:#70 & #36: None of shops have a deficient maintenance and the proportion of them witha regular maintenance is low.

Rule B.4:#37 & (#70 & #36): The proportion of shops with a good maintenance is high. None ofPoSs has a deficient maintenance and the proportion of them with a regular maintenance islow.

45 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 46: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Case PresentationDatasetObtaining segmentationsRanking and selecting segmentationsQualitative description

Results

Qualitative description for Class 1Class 1

=======

Almost all shops have a medium-sized display and a medium sensitivity to promotions.

The proportion of PoSs with thermal product display is very high.

All PoSs have competition and the proportion of them with a strong competition is high.

The proportion of shops with a good maintenance is high. None of PoSs has a

deficient maintenance and the proportion of them with a regular maintenance is low.

None of PoSs has a deficient communication and the proportion of them with a good

communication is very high.

None of PoSs has a deficient aesthetics and the proportion of them with a good

aesthetics is very high.

The proportions of shops with a number of assistants greater than or equal to many

are very high.

The proportions of stores with a size greater than or equal to medium are high.

The proportions of shops located in inner cities and no mountain towns are high.

The proportion of PoSs with a secondary Grifone weight is high while the proportion

of them with a minimal Grifone weight is low.

Strong competition, good qualities, medium/big stores, non inmountain, secondary but existing Grifone weight.

46 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 47: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Main contributionsOutputFuture Research

Outline

1 Introduction

2 Literature review

3 Fuzzy criteria for selecting classifications

4 NL-based automatic qualitative description of clusters

5 Application to market segmentation

6 Conclusions

47 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 48: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Main contributionsOutputFuture Research

Conclusions

Complete MCDM system presented.

Main contributions:1 Fuzzy criteria (Chapter 3).2 Aggregation vs. sequential approach. (Chapters 2 and 5).3 NLG system (Chapter 4).4 Application (Chapter 5).

48 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 49: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Main contributionsOutputFuture Research

Output of this Thesis

1 Journal papers:German Sanchez-Hernandez, Francisco Chiclana, Nuria Agell, Juan Carlos Aguado (2013).Ranking and selection of unsupervised learning marketing segmentation. Knowledge-Based

Systems, 44:20–33.

2 Conference proceedings:German Sanchez, Monica Casabayo, Albert Sama and Nuria Agell (2008). Forecasting Customer’sLoyalty by Means of an Unsupervised Fuzzy Learning Method. Electronic proceedings of the 28th

International Symposium on Forecasting, 43. Nice, 22-25 June 2008.German Sanchez, Nuria Agell, Juan Carlos Aguado, Monica Sanchez and Francesc Prats (2007).Selection Criteria for Fuzzy Unsupervised Learning: Applied to Market Segmentation. InFoundations of Fuzzy Logic and Soft Computing (IFSA). Lecture Notes in computer Science,4529:307–310.Cati Olmo, German Sanchez, Nuria Agell, Monica Sanchez and Francesc Prats (2007). UsingOrders of Magnitude and Nominal Variables to Construct Fuzzy Partitions. Proceedings of the

IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 1–6. London, 23-26 July 2007.

49 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 50: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Main contributionsOutputFuture Research

Output of this Thesis

3 National and international workshops:Francisco J. Ruiz, Albert Sama, German Sanchez, Jose Antonio Sanabria and Nuria Agell (2011).An interval technical indicator for financial time series forecasting. Proceedings of the 25th

International Workshop on Qualitative Reasoning (QR).German Sanchez, Albert Sama, Francisco J. Ruiz and Nuria Agell (2010). Moving intervals fornonlinear time series forecasting. Proceedings of the 13th International Conference of the Catalan

Association for Artificial Intelligence (CCIA).Jose Antonio Sanabria, German Sanchez, Nuria Agell and Josep Sayeras (2010). An application ofSVMs to predict financial exchange rate by using sentiment indicators. Proceedings of the V

Simposio de Teorıa y Aplicaciones de Minerıa de Datos (TAMIDA).German Sanchez, Juan Carlos Aguado, Nuria Agell, Monica Sanchez (2009). AutomaticComparison and Selection of Classifications in Unsupervised Learning Processes. XI Jornadas de

ARCA Sistemas Cualitativos, Diagnosis, Robotica, Sistemas Domoticos y Computacion Ubicua

(JARCA). Almunecar (Granada), 24-26 June 2009.German Sanchez, Juan Carlos Aguado and Nuria Agell (2007). Forecasting New Customers’Behaviour by Means of a Fuzzy Unsupervised Method. Artificial Intelligence Research and

Development, Frontiers in Artificial Intelligence and Applications. Proceedings of the 10th CCIA.,163:368–375. Andorra, 25-26 October 2007. ISBN: 978-1-58603-798.

50 / 51 German Sanchez-Hernandez Ranking and description of classifications

Page 51: A contribution to ranking and description of classifications

IntroductionLiterature review

Fuzzy criteria for selecting classificationsNL-based automatic qualitative description of clusters

Application to market segmentationConclusions

Main contributionsOutputFuture Research

Future Research

Criteria: improvements.

Aggregation: study of other OWA operators and linguisticquantifiers.

Qualitative description: ontology, grammar.

Application: to assess unsupervised techniques, ensemble oftechniques.

51 / 51 German Sanchez-Hernandez Ranking and description of classifications