Top Banner
Guide to using Correspondence & Cluster Analysis
51

CHOICES 3 - C&C Manual

Apr 03, 2015

Download

Documents

arindam_das
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CHOICES 3 - C&C Manual

Guide to using Correspondence & Cluster Analysis

Page 2: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

Table of Contents

Welcome………………………………………………………………………… 2

Aims of this Guide……………………………………………………… 2 Queries and Support…………………………………………………… 2

Correspondence Analysis…………………………………………………… 3

What is Correspondence Analysis?………………………………….. 4 Correspondence Analysis step by step……………………………… 5

Setting up the crosstab in Choices3…………………………………..5 Editing the Correspondence Map…………………………………….. 5 Interpreting the map……………………………………………………. 6 The Statistics View……………………………………………………... 11 Headings in the General Statistics Views for rows and column…… 12 Axis Statistics…………………………………………………………… 13 Eigenvalues Table View……………………………………………….. 15 Looking at the map in different ways…………………………………. 17 Formatting the map display…………………………………………….20 Printing……………………………………………………………………21 Overlaying Data………………………………………………………… 22 Incorporating 3-D Statistics into your map……………………………23 3D Correspondence Mapping………………………………………… 24

Cluster Analysis……………………………………………………………….. 28

What is Cluster Analysis………………………………………………..29 Cluster Analysis step by step…………………………………………..30 How to set up a Cluster Analysis…………………………………….. 32 Interpreting the results…………………………………………………. 37 Summary Statistics…………………………………………………….. 37 Cluster Report Window………………………………………………… 37 Cluster Solution Window………………………………………………. 38 Cluster Groups Window……………………………………………….. 38 How many cluster groups should I choose?………………………… 41 Taking your clusters back into Choices3…………………………….. 43 Overlaying your cluster solution onto the original map…………….. 45 An example TGI Cluster Analysis: The Shoe Market…………………….46 Crosstab…………………………………………………………………. 46 Selection of Lifestyle Statements…………………………………….. 46 Run Cluster Analysis…………………………………………………… 47 Interpreting the results…………………………………………………. 47 Importing the clusters back into Choices3……………………………47 The Statistics Explained………………………………………………………49

1

Page 3: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis Welcome

Thank you for licensing this product from KMR-SPC Software.

Aims of this Guide The aim of this guide is to help you to run a Correspondence Analysis and if appropriate a Cluster Analysis from the results of the Correspondence Map. Correspondence Analysis is an integrated part of the Choices3 software and results are shown in the Choices Viewer. Cluster Analysis is a module of the Choices3 software and functions as part of the Choices3 software.

Advancing technology and the on going development by the team at KMR Software has allowed these advanced statistical techniques to be available on your PC desktop. The processing takes a remarkably short amount of time. However the essence of these techniques is the time and thought put in by you to get the results out of the analyses to back up the strategy or story you want to present. Please set aside enough time to make considered decisions about the results of the analysis. The software is powerful enough to allow you to do this without waiting long periods of time for the results. Recent enhancements to Correspondence Analysis include a new “clean-up” tool that allows quick visual interpretation of the map. Colour coded reports provide another unique perspective on relationships between variables on the map. Training in these techniques is available from KMR Software as part of your licence agreement.

Queries and Support Please call the helpdesk with any queries on +44 (0)20 7831 5455 or email on [email protected] asking for the Choices3 team.

2

Page 4: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

CORRESPONDENCEANALYSIS

3

Page 5: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis What is Correspondence Analysis?

Correspondence Analysis is a market segmentation technique that graphically represents the relationship between brands or products and other variables such as attitudes, media titles etc. It is also used as a preliminary step to Cluster Analysis, determining the most discriminatory Lifestyle statements for the chosen market. Correspondence Analysis runs from a crosstab. Usually, the brands or products are the columns and the attitudinal statements (or other variables) are the rows.

An Example of a Basic Correspondence Map

Types of Major Shoe Retailers Vs Attitudes The following is a basic map plotting attitudinal statements against the shoe shops that people have stated they use.

4

Page 6: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis Correspondence Analysis Step by Step

Setting up the crosstab in Choices3 • Enter the target market (usually your brands) in the columns checking sample sizes

are greater than 200 • Enter either any agree or definitely agree lifestyle statements as rows • Enter ‘all users’ of the market as your filter (If you intend to run Cluster analysis the

sample must be greater than 2000) • Edit your headings so they are concise (in ‘Edit Table’ area) • [If you want to ‘overlay’ info. enter this into columns (e.g. demographics/media)] • Save and then Run the crosstab • Select correspondence analysis using the icon or going to the “Analysis”

options The Correspondence map will be generated within the Choices Viewer along with the related statistics.

Editing the Correspondence Map At this stage, before editing the map, you will want to select the statements that best describe your map and eliminate the rest. There are two methods available: Manual clean-up method • In the Choices Viewer, select the Statistics view and expand General Statistics • Click on "Rows" • Click on the "Dist" column (this sorts the rows by ‘Chi-distance’) • Right-click on the rows and choose "Select top n…" and then choose the number of

statements you wish to include in the map (usually about 15-30) • Right-click and choose "Invert selection" • Right-click and select "Change status" …and then "to passive" • Select the map from the analysis tree • Right click on the map and choose ”Select” and “All passive rows” • Right click again on the map and select "Hide" • Edit the map by moving the labels and changing text where necessary • To rename the map, from the toolbar select “Edit” and “Title” • To insert labels for the x and y axis, from the toolbar select “Insert” and “New label” • If you are going on to do a cluster analysis - print the statements used in the map:

Ensure you are in the “Statistics" view and then choose "File" and "Print" Clean-up method The clean-up method simply requires the user to specify the number of rows to select in order to tidy up the map. • From the “Select” menu, choose “Clean-up Map” • When prompted to “Select top Chi Distance values for rows” enter the number of

rows required for map. It is possible to set this number as the default using the tick box in this dialogue box.

• The map will now show just the top number of rows selected. • Alternatively auto clean-up will automatically tidy up the map taking the default number

of rows set in the clean-up map option. From the “Select” menu, choose “Auto clean-up map” or use the icon.

• If you are going on to do a cluster analysis, print the statements used in the map:

ensure you are in the Rows view of General Statistics and then choose “File” and “Print”.

5

Page 7: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

(Firstly, you shoucombined

(ii)Assessing the relatio

lines that are drawn fto 0°/180° means a hiOtherwise right-angleindicates little or no r

(iii)Assessing the relatio

statement, going throthe statement, along tpositive relationship imap. The closer the brelationship. The furt

Interpreting the Map

The Correspondence Adata and will produce awithin the data. The mthe second most impor For instance, in the premight reflect ‘Real Menand that skincare produConscious’ people whoclothes. In this case, as with allless important than thediscussed later in the mthose whose attitudes are more traditional. Now we will run througCorrespondence Analypresent any distinct pavariables that you are u

The importance of the ‘V

Interpreting the Correspondence Map

ld ensure the variance of the map is sufficiently high – the variance for axis 1 and 2 needs to be over 60%.

nship between two brands is done by measuring the angle between the rom the two brands to the centre (origin) of the map: An angle closer gher positive or negative relationship respectively between the brands. s between brands, or thereabouts (i.e. angles of 90% or 270%) elationship.

nship of brands to a statement is done by taking a line from a ugh the ‘origin’ to the other side of the map. The distance of brands to his line, determines the strength of relationship. Again this is a f the brand is on the same and negative if on the opposite side of the rand is to the origin along the statement line, the weaker the her out towards the edges of the map, the stronger the relationship.

nalysis program will search for correlation within the map based on the two 'themes' which were strongest ost important theme will form the basis of the x-axis and tant, the y-axis.

vious example above (on p 4), one end of the x-axis ’ who believe real ale is the only beer worth drinking cts are for women and the other end ‘Image are more concerned with fast cars and designer

correspondence maps, the vertical or ‘y’-axis is much horizontal (it has a relatively low ‘contribution level’ – anual). However, you might differentiate between

lean towards being financially aware versus those who

h a number of key questions you might ask about the sis Programme. Remember, if the data doesn't seem to tterns, you may need to study the combination of sing and/or re-run your analysis.

ariance Explained’ figure

6

Page 8: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

The variance explained figure is a measure of how well the map is explaining the variables in it. Ideally on a survey such as TGI at least 60% of the variation within the market should be explained by the first 2 axes. However, in reality this may not happen, especially if very few of the brands or variables overlap (e.g. the statements “My diet is mainly vegetarian” and “ I am a vegetarian”). If you are in the map view itself this information is given in the bottom left hand corner of the map. If the figure is low (we would recommend for a correspondence map that the minimum acceptable level is 60%) it indicates that these axes do not give a sufficient explanation of the data. Thus the calculations are probably not significant enough to create a whole map and the map will not sufficiently explain the differences between the brands. Note that statistically any set of data will contain some variance but not all are sufficiently strong. Also, users of some products might be very similar attitudinally and might be better differentiated against other variables such as demographics.

What is being expressed along each axis?

Each axis should reflect a dimension within the data, which can be summed up or described by the user using appropriately descriptive labels. Examples of dimensions might be introverted / extroverted or traditional / innovative. The correspondence map can plot any 2 dimensions and will plot the two strongest ones. However, you should also look at the other axes to see how other polarities express themselves within the data. This is explained on page17.

Which brands are the most important?

The brands around the centre of the map will be those that are 'average', or not as strongly differentiated as the brands around the outside of the map. Brands near the edge of the map are those which have more extreme variation or differences from other brands and attitudes. In practice these might be the smaller brands which may attract a more specialist or distinctive consumer.

How do I measure relationships between variables on the map?

There are two main measurements that you can make with a ruler and/or a protractor shown below. You should remember that the x-axis would have been stretched or shrunk to fit on your screen so it will not be shown true to scale.

1) Making comparisons between brands:

To find out the correlation between two brands, simply draw a line from each one to the origin, and measure the angle between them. An angle of 0º represents 100% correlation, 180º shows 100% negative correlation, and 90º (or 270º) shows no correlation. Brands B and C are diametrically opposite; i.e. there is a strong negative correlation. It is important to know that these brands are opposites in the market. This is as opposed to A and B, which are

7

Page 9: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

positioned in a similar area of the map and consequently have similar market positions.

Column vs Column Analysis View

Alternatively the Column vs Column Analysis can be used to compare the relationships between brands. Each brand is taken in turn (shown at the top of the table, in the example below the brand is Clarks) and the analysis presented in the form of a colour-coded table. The brands shown in red have a close correlation with Clarks whereas those shown in white have no correlation. Those brands shown in blue have a strong negative correlation with Clarks, for example Dolcis and Clarks are opposites in the shoe market.

RED

BLUE

Use the next target buttons to scroll through the different

8

Page 10: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis brands and view the relevant analysis 2) Making comparisons between statements and a brand

Relationship becoming more strongly positive.

Relationship becoming more strongly negative

You can see how different statements relate to a brand. Draw a line from the brand through the origin, and then draw perpendicular lines from each statement to the line (i.e. at 90º). The relationship between the Brand A and the lifestyle statements X,Y and Z is shown by the point where the statement’s intersection line hits the Brands origin line. Positive relationships lie on the same side of the origin as the brand. Negative relationships lie on the other side of the origin to the brand. In the example shown above consumers of Brand A have a strong agreement with statement Z. Consumers of Brand A disagree more strongly with Statement X than Statement Y. The closer the brand is to the origin along the statement line, the weaker the relationship. The further out towards the edges of the map the brand is, the stronger the relationship.

Column vs Row Analysis Alternatively the Column vs Row Analysis can be used to compare the relationships between brands and lifestyle statements. Each brand is taken in turn (shown at the top of the table, in the example below the brand Ravel) and again the analysis is presented in the form of a colour-coded table. Ravel shoppers have a strong agreement with the lifestyle statements shown in red whereas they have a strong disagreement with those statements shown in blue. Example: Look at the top 12 statements in the list (in red, the closest to Ravel) and try to find a common theme. In this example, these statements could be part of the “Image Conscious” theme.

9

Page 11: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

Use the next target buttons to scroll through the different brands and view the relevant analysis.

RED

BLUE

10

Page 12: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis The Statistics View

In this example we will use a very straightforward map showing a selection of shoe shops people might use against attitudinal statements, to explain the various statistics. As before, the map itself will look something like this:

The statistics view contains information, which will allow you to describe your correspondence map in more detail. An example of the Column statistics view is given below along with explanations of each of its components and how they might be used. Please note that by clicking on the column heading (e.g. ‘Mass’, ‘Inertia’) enables you to sort by that statistic in descending order.

11

Page 13: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

The statistics of a Correspondence Map are based on the Chi-squared statistic, which measures deviations from expected values. The inertia is the chi-squared statistic divided by grand total of all cell entries in the table. This total inertia is what the correspondence map will explain. The total of eigenvalues across all dimensions is the total inertia. A process similar to factor analysis re-allocates this inertia between a series of dimensions, which will be the axes of the map.

Headings in the General Statistics Views for Rows and Columns:

#

These numbers represent the original numeric order of the variables that were assigned immediately after the creation of the correspondence map. Subsequently you may use this row / column to re-order your variables to their original order should you so wish.

Key

The key represents a code reference for your variable. Note: A default code is given to each variable if no code can be found.

Mass

The Mass figure represents the percent of data in the crosstab that is in that row or column. This is most useful if your map is based upon ‘projected’ figures (i.e. the 000s figure in your crosstab), rather than the ‘Vertical Percent’ since then the mass would represent the size of the brand. NB Choices will automatically use Vertical Percent as your map basis. This means that your brands are measured in terms of the percentage of those

12

Page 14: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

using it. Please contact the KMR-SPC Helpdesk if you would like advice on using different statistics as your map basis.

Distance (‘Dist’) / Chi² Distance

‘Distance’ refers to ‘Chi² Distance’ on the map, this figure is important for measuring the distance of variables from the centre of the map, or the ‘origin’. This Distance is the squared distance of row/column point from the origin of the map; Inertia of row/column divided by its % mass Chi² Distances are statistical values used to make the correspondence map. The higher the value the more discriminating the attribute. They are most useful for assessing the discrimination power of your attributes in a conventional correspondence map.

Chi² Distances

The chi² distance measures how well theoretical data 'fits' observed data. It is calculated by measuring an 'expected' value for each cell and comparing this with the actual observed data. The 'expected' value is that which would occur if there were no relationship between the row and column. Brands with large differences between observed and expected values will have a high distinctiveness, while those with an average performance will have low distinctiveness. In the map, distinctiveness corresponds to the distance from the origin, but measured over all the dimensions not just the two shown on the map. Often a small brand has the most distinctive image.

Inertia

This figure shows how strongly each variable contributes to determining the overall shape of the map, and is a breakdown of the ‘variance explained’ figure. You will find it most useful for discriminating between brands usually your columns. This figure is calculated by multiplying the mass by the distance. The total of eigenvalues across all dimensions is total inertia.

Axis Statistics

Co-ordinates

The Co-ordinates view shows the position of each row/column point on each axis. The overall distance of each point from the origin has already been fixed above. The position on each axis will depend on how much of the inertia of that row/column is explained by that dimension. On each axis the Sum of the squared co-ordinates (i.e. squared distance) times the mass of each point gives the inertia of that dimension. Axes 1 and 2 represent the actual co-ordinates used to construct the default map. Negative numbers mean that the point is on the opposite side of the origin to positive numbers on the same axis.

13

Page 15: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

By looking at the table above you can see that on the x-axis (axis 1) the points on the right of the map (positive values) are: Ravel, Dolcis, Next, House of Fraser, Bally, Barratts, Debenhams, Saxone and John Lewis. Moreover, Ravel has the biggest value of these; i.e. in this case it would be furthest to the right. (Please note however, that it is possible to ‘flip’ your axes on the map; consequently the above would relate to the left of the map and not the right.) You can also sort each column by clicking on the tab label at the top of each column. This may reveal other axes (other than axis 1 and 2), which could be better at explaining some of your key variables.

Absolute & Relative Contributions

For both of these views each row of data represents one variable. Similarly, each of the rows sums to 100%, reflecting the importance of that axis in explaining the variable. These views reveal that there is more than one axis that you can use for your analysis. Although the initial correspondence map is based upon axis 1 and 2 (the best axes to explain your variables overall), you may choose other axes which are stronger in explaining variables which you deem as key to the analysis. Absolute Contributions – add to 100% down all rows or columns for a single axis (i.e. vertical percents). Shows the percent of all inertia on that axis which is due to that row or column.

14

Page 16: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

Relative Contributions – add to 100% across all axes for a single row or column (i.e. horizontal percents). Shows the percent of all inertia in that row or column which is explained by that axis. It is better to look at the Absolute Contribution view to assess your attributes and at the Relative Contribution view to check your brands.

Eigenvalues View The Eigenvalue for each dimension gives the amount of variation explained by that dimension. These values are used to calculate the correspondence map.

Eigenvalues table view

The sum of the active Eigenvalues is the total of all of the Chi² deviations for every cell in the table. The larger the number the more a table will deviate from expected values. The dimensions of the Correspondence Map are trying to explain this sum and the output shows various statistics for the dimensions that usually explain most of the variation in the data. Dividing the Eigenvalue of each dimension by the sum of the Eigenvalues gives the % of variation explained by that dimension. The dimensions are always listed in descending order of importance.

% The % column gives the percentage of variation explained by dimension.

%+ The %+ column gives the percentage of variation explained by all dimensions up to and including the current one.

Pie Chart

The pie chart gives a graphical representation of the percentage of variation explained by dimension.

15

Page 17: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis Correspondence & Cluster Analysis

16

16

Page 18: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis Looking at the Map in Different Ways

Selecting Axes

As mentioned previously, a correspondence map will find many axes, of which only axes 1 and 2 are used in your initial map and which reflect the greatest variation in your market. Thus axes 1 and 2 will become the X-axis and the Y-axis respectively.

As you become more ambitious you may want to use one of the alternative axes – perhaps one of the axes other than 1 or 2 show better discrimination for the brands you are looking at. To do this, in the map view go to the View menu and select Add Map. Alternatively you can use the add map icon on the toolbar. Enter a title and select the axes you wish to show on the new map. The new map will be displayed in the Choices Viewer.

Active, Passive

Points on the map can be made: Passive (Vs Active) - Points/variables start off as ‘active’ (i.e. they contribute to the map calculations), but when made passive they no longer contribute to the shape of the map. Assuming they have not been hidden (see below), these passive variables are plotted in green so you can see where they would lie on the map. Passive points on a map have no mass, so do not affect the shape of the map. They are excluded from the table used to calculate the map, and then their positions are superimposed on the map afterwards. The position of a passive row is fixed by its pattern of answers across the active columns. So a passive row goes close to the columns it is strongest on, as with an active row. In the same way, a passive column is fixed by answers across active rows. So

17

Page 19: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

changing points to passive means the map is redrawn excluding those points, and then passive points are positioned afterwards. Overlaying demographics and media means adding these as passive points. So the map is unchanged – it is still drawn based on active rows and columns only. NB: By using this feature the whole map is re-drawn meaning that any editing will be lost. This is because the variables upon which the map is based are re-calculated.

You may choose to make points passive for the following reasons:

1) Low sample sizes If you have included brands with low sample sizes (less than 200) in your map, they should be made passive since they can be statistically unreliable.

2) Low chi-distance Although you can put as many lifestyle

statements as you want on the map, the map can become very cluttered. Consequently we recommend that you use around 15 to 20 statements.

3) Additional variables Correspondence maps show the relationship

between two sets of variables. Nevertheless, it can be interesting to overlay other unrelated variables onto the map. These new variables should be made passive, or they may influence and alter the shape of the map i.e. media overlays.

Hiding Points

Hide (Vs Unhide) - Removed from view on the map; passive points will be removed completely, and active points will still contribute to the shape of the map but won't be shown. (You can hide items by using the mouse and right clicking and selecting Hide. To unhide items use the View menu and select Hidden Objects then simply select those items you wish to view).

You might want to make points hidden for the following reasons:

1) Too much data on the map If you have too many brands or statements and you just want to focus on a few variables you can 'hide' certain variables so they are not plotted on the map. NB: They will still have an influence on the shape of the map.

2) Passive Brands If variables have been made passive

because their chi-distance was too low to be of interest, they can be hidden to stop them cluttering up the map.

18

Page 20: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

19

Page 21: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

Formatting the Map Display

To format and improve the appearance of the points on your map, first select them by clicking on them and then use the right mouse button and select properties. Alternatively select the points you wish to format and use the Point Properties icon on the toolbar. You will then be provided with the ‘Point Properties’ box shown below. Depending upon what you have selected (see ‘select’ option for doing multiple and/or specific selections) the editing menu will give you various options under the tab names. Please note however, that the best way to learn the editing options and appreciate how they can improve the appearance of your map is to go in and have a go!

Font Options Provides typical Windows style editing options.

Symbol Options Here you have a number of options to change map symbol sizes and shape. For example, you may wish to distinguish Row from Column points through a different shaped symbol. It can also be used to undo the effects of the 3-D statistics option (see p.23).

Label Options Here you can choose from a variety of options with which to highlight your labels, such as using sunken, raised, shortened labels or by changing their colour.

Label Text Options Using this section you can edit the text of your selected point. You may find these options particularly useful because they allow you to change the text of your selection to upper or lower case. Note: You will only get this option when one point is selected.

20

Page 22: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis Printing

Maps can be printed to your default printer. The following is the group of options that you access in Print in the usual Windows manner.

Print Setup Here you can change the default printer and/or the paper size and orientation.

Print Preview Use this option to preview your output (this is not accessible in the Eigenvalues Pie Chart view).

Adding a Map Title Ensure before you print, you have checked/edited the map title. Using the mouse click on the map and select Title from the Edit menu.

Any of the views can be printed in the normal Windows manner. We recommend that you use the report for Rows General Statistics to take the most discriminating lifestyle statements into Cluster analysis.

21

Page 23: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

Overlaying Data

Once you have generated your map you can overlay any other survey data onto the map to see where it would be placed. Because a crosstabulation must be re-run to overlay data, any previous editing that you have done will be lost. Common examples of the sort of information you might want to overlay are: 1) Media consumption 2) Frequency information 3) Cluster groups onto the original correspondence map 4) Non-users of the brand(s) you are interested in 5) Demographic groups such as age or social grade To overlay data you should follow these steps: ♦ Work from the original spec file that you used to generate the

correspondence map. Add the extra information (e.g. TV programmes) as columns.

♦ Re-run the correspondence map from the crosstab. ♦ Make the points you are overlaying passive first, so they don't influence

the shape of the map. ♦ Remove the less discriminating lifestyle statements. ♦ Tidy up the map. The points that have been overlaid will now be superimposed onto your map. These are coloured green by default. NB: If you know beforehand that you wish to overlay other information, you can include these from the outset. In such cases it is important to remember to make these variables passive.

22

Page 24: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis Incorporating 3-D Statistics into your Maps

One of the newer features of Correspondence Analysis is the ability to represent key statistics on the map itself. This 3-D effect is achieved through varying the size of points, so that larger points reflect larger values. Consequently you can have information about the relative importance of variables incorporated into the map. To use the 3-D display of variables, select the Analysis Wizard icon from the toolbar or select Analysis Wizard from the Analysis menu. Select the option “Vary symbol size by statistic”. The statistics are split between General and Detailed statistics (the screen you will see is shown below), choose the option which best suits your requirements, and follow the appropriate instructions.

23

Page 25: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis 3D Correspondence Mapping 3D mapping allows you to see variables plotted on the main 3 axes of the Correspondence map in 3D. From the correspondence map in the Choices Viewer, click on the 3D icon on the top right of the tool bar: Once in the 3D mapping view, use the following icons on the toolbar to format the 3D view:

This “Toggle Fog” icon allows you to change the clarity of the 3D view. The “View Labels” icon allows you to see the labels for all of the variables plotted. The “Small Symbols” icon changes the size of the symbol denoting the position of the variable on the map.

The “Large Symbols” icon changes the size of the symbol denoting the position of the variable on the map.

The “Wire Frame” icon changes the texture of the 3 axes to a wire frame look.

The “3D Glasses” icon allows you to see the map in a fully 3-dimensional view. Use actual 3D glasses for the full effect.

The “Play” icon allows the 3D view to be automatically rotated.

The “Stop” icon ends the automatic rotation of the 3D view.

The “Move In” icon allows you to enlarge the 3D view and zoom in.

The “Move Out” icon allows you to reduce the 3D view and zoom out.

To manually rotate your 3D map, click on your left mouse button, hold down and move in the required direction. To choose any of these settings, select the icons you require to enable them. All of these options can also be enabled from the “Options” and “View” menus on the toolbar.

24

Page 26: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis Menu Commands

The FILE Menu

The file menu contains basic Windows commands for opening, closing, saving and printing maps. Page Setup and Print Preview allow you to change the orientation and margins, and see how the final print will look. Using Export, you can export the map as an enhanced metafile which can then be inserted into Word and PowerPoint documents etc. The map statistics can be copied and pasted in to Excel if required. You are also able to open up the last seven files that you worked on.

The EDIT menu

Of particular note in the edit menu is the facility to give the map a title using the ‘Title…’ option (see below).

25

Page 27: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

Here, as previously mentioned, you can also change the status of variables to and from active or passive status (i.e. changing whether particular variables contribute or do not contribute to the map construction – initially most, if not all, of your variables will be active).

The VIEW menu

Working much like the standard windows view menu, these options not only include details of exactly what your hidden points are (and allow you to unhide them), but also give you the option to flip the axis around on the map display.

The SELECT menu

You can use the select menu to select/highlight points by your specifications. For example you can select all passive / active / row / column variables etc.

Similarly the selection wizard gives the option to do more complex selections dependent upon the statistical values of points: You are also asked how many of the top scoring variables you wish to select.

26

Page 28: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

Clean-up map and auto-clean-up map can also be accessed through the select menu. Clean-up map will prompt you to “Select top Chi Distance values for rows”. Enter the number of rows required for map. It is possible to set this number as the default using the tick box in this dialogue box. The map will now show just the top number of rows selected. Alternatively auto clean-up will automatically tidy up the map taking the default number of rows set in the clean-up map option.

27

Page 29: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

CLUSTER ANALYSIS

28

Page 30: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

What is Cluster Analysis? Cluster Analysis is a powerful segmentation tool allowing users to segment a given population into discrete groups of similar individuals. Cluster Analysis can be applied to any set of comparable variables and is commonly used to segment people based on their responses to a series of attitudinal statements. Cluster Analysis can be used for example to create attitudinal groups of respondents where-by the respondents within each group have responded similarly to a battery of attitudinal statements. These groups can then prove to be very powerful discriminators within a given market. This Cluster Analysis program provides an easy means of selecting the target population and input variables. The analysis can be run to any given level and the results can be viewed interactively on-screen. The program provides links with the Choices3 analysis package. This allows the definition of the target market from within Choices and the export of selected solutions back into Choices for further analysis.

29

Page 31: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

Cluster Analysis Step by Step Follows on from the Correspondence Analysis Step by Step on Page 5 Preparing to run a Cluster Analysis • In Choices3, using the original input file, from the toolbar select "Tools" and "Save

Cluster Filter File". This uses your filter as part of the cluster program and forms the universe to be segmented.

• You should ensure that the sample size for your filter is greater than 2000. • You will be asked if you wish to run the cluster analysis. Select "Yes".

Running the Cluster Analysis • Select "Start a new cluster project". • Select the database to be used (i.e. the survey you used for the Correspondence

Map). • Choose a filename (max 6 letters) and a title for your work. • Select ‘Change Filter’ and then choose the base/filter you were using in Choices3. • Select lifestyle statements by clicking on the ones you wish to use (as listed on your

correspondence printout) • Now select the icon "Run Cluster”, selecting a solution number (i.e. the maximum

number of cluster groups you think you might want e.g. 9) • When the analysis has completed, select the cluster report and then go to the section

below on interpreting the cluster analysis…

Interpreting Cluster Analysis The interpretation below consists of three stages; the first two establish if there is a minimum and a maximum number of cluster groups that you should use, based upon some basic statistics. The last stage is more creative and involves the user selecting the best solution (e.g. Solution 6 – which will have 6 groups in it) for describing your market: (i) Go to ‘Cluster Report’ and establish if there is a minimum number of groups that you can use –

when using TGI data a Variance Explained of >12 should be used. (ii) (Also in Cluster Report) check the maximum number of groups you can use by ensuring the

smallest group figure is >200. (iii) You will now need to decide which Cluster Solution is most appropriate:

To do this, start by looking at all the groups in Cluster Solution 3 and summarise the characteristics of each group within it in terms of their overall attitudes (give each an appropriate name to summarise). Next, repeat the process with the next higher Solutions e.g. 4 and then 5 and so on - You should find a point where using further cluster solutions adds no information or indeed loses some group definition. At this stage you have found the optimum cluster solution for dividing the market.

Opening the cluster back in Choices3 • In Choices go to the top toolbar and select "Tools" and "Import Cluster Solution" -

your cluster solution will appear at the bottom of your dictionary. • These can either be used to run further crosstabs or put into the original crosstab

under columns and then run as a correspondence analysis. The solutions should be made passive so as not to affect the map but to show where they appear in relation to your market and lifestyle statements

30

Page 32: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

How to set up a Cluster Analysis This description has more detail than the step by step guide but is the same process

Use Correspondence to find Discriminating Lifestyle Statements

A correspondence map should be carried out first in order to get the most discriminating lifestyle statements. After identifying the top 15-20 statements in order of Chi Distance, print out the list of statements. We encourage clients to do a correspondence map first – because it shows clearly the statements that discriminate most strongly in that market. Usually you make all except the top n statements (on distance) passive and hidden to make the map clearer. The remaining active n statements will be good ones for cluster. But you could leave the top 40 statements on the map, and only take the top 20 (which is the number we suggest) for cluster. Leaving out good statements will weaken the power of the cluster analysis. But cluster will allow you to pick any statements you want – you don’t have to do a correspondence analysis first, but in nearly all situations it is best to do the map first.

Save a filter in the Choices3 coding window

To run a cluster analysis from Choices3 you must first create a filter, or base of respondents which the cluster analysis will use. Typically this filter is the target market (e.g. Bottled Lager users, Heavy Shampoo users, Everyone who has bought shoes etc). The filter should contain at least 2000 respondents. Defining clusters from a filter less than this may result in an unreliable size for the smaller cluster groups in your favoured solution. ♦ Within Choices3 add your target market to the filter. This would usually be

the same filter that you used for the Correspondence Analysis. You may find it useful to look up the sample size before running the program.

♦ Select ‘Tools/Save Cluster Filter File’. ♦ Answer yes when prompted to save the filter file ♦ Answer yes to when asked if you want to run Cluster Analysis. You do not have to run Cluster Analysis at this point. You may start the program later to run the analysis with the filter you have saved.

31

Page 33: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

Start a new project You are faced with the dialogue box below whether you have launched the Cluster software through Choices3 or from the Shortcut.

♦ Either select 'Start a new cluster project' and click OK. If the program is already running select 'File/New Project' from the main menu.

♦ From the New Cluster Project dialogue select a Database to use for the

project. The cluster database must correspond to the survey you are using in Choices3

♦ Enter a title for the analysis, and a project name. You may find it helpful to

use the same naming convention that you used for the Correspondence Analysis. Click OK.

32

Page 34: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

Selecting Filter and Variables The Cluster Definition window defines the filter and variables to use in the analysis.

Choose the Filter ♦ To select a filter, click the 'Change Filter' button. The currently selected

filter is shown next to the Change Filter Button (including the sample size). ♦ Choose a filter and click the OK button.

Choosing the variables

Add statements to analysis:

Remove statements from Analysis:

33

Page 35: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

♦ Select the statements to use. Using the print out of the most discriminating lifestyle statements, select them from the database listed on the left-hand side. The variables you have selected are displayed in the right hand list. To select a variable, highlight it and either double-click or click the appropriate button.

♦ To remove a variable, highlight it in the Selected list and click on the

appropriate button. ♦ The order of the variables is fixed in database order. Variables will be

removed from one list and added to the other and therefore will never appear more than once.

To highlight more than one variable at a time, click and drag with the mouse to highlight a range or hold down the Control (CTRL) key and click with the left-hand mouse button to highlight non-adjacent variables. It is recommended that you choose a maximum of 25 statements; usually 15-20 are selected.

Run the Analysis

To run the analysis, click the run button on the speed bar. This button is only activated if the currently active window is the Cluster Definition Window.

Run analysis: Enter the maximum number of cluster groups to create, and click the OK button to start the analysis process. (If you selected '6' groups, Choices would create not only the 6-cluster solution, but also 5, 4, 3 and 2-cluster solutions.

34

Page 36: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

Analyses on a sample of up to 10,000 typically do not take longer than 10 minutes using a contemporary PC. However It is not possible to predict how long an analysis will take as it depends on several factors:

i) The size of the filter (base) ii) The number of cluster solutions chosen iii) The speed of your computer iv) The actual market you are looking at

Where possible, the program will give an indication of progress for each part of the process. An analysis can be cancelled at the end of each data pass. To cancel a running analysis click the Cancel button and wait for the system to respond at the end of the data pass.

35

Page 37: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

Interpreting the Results

Once the Cluster Analysis has finished running there are various statistics that you can print out or view on screen:

♦ Summary statistics window ♦ Cluster report window ♦ Cluster solution window ♦ Cluster groups window

All reports are accessible from the View menu. It is best to work through the results in the following order: 1) View summary statistics 2) Look at an overview of all the cluster solutions produced 3) Examine an overview of a particular cluster solution 4) Examine a particular cluster solution one group at a time

Summary Statistics Select 'View / Summary Statistics' from the menu.

This window shows overall mean (average) and standard deviation (measures degree of spread in answers) for total sample (target population). This information can be useful for identifying statements with highly-skewed distributions. All variables are normalised to a mean of 0 and a standard deviation of 1 before the cluster analysis runs - this ensures that each variable is given equal importance in the analysis.

Cluster Report Window This window shows Variance Explained % - the proportion of the total variation explained by that cluster solution. Cluster analysis tries to find groups with low variation within groups, and big differences between groups. The figure shown is the percentage of variation that is between groups rather than within them. So the higher this figure is, the better. The sizes of the smallest and largest groups are also shown for each solution. The size of all groups is shown in the Cluster Solution Window.

36

Page 38: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis Cluster Solution Window NB: Both the Cluster Solution Window and the Cluster Group Window will appear if you click on “View” “Cluster Solution”.

This window shows information for all cluster groups within a given solution. The report provides a way of examining an overview of a given solution - the same figure is displayed for each group alongside each other. The main objective of this report and the Cluster Group Report is to examine the detailed biases within each cluster group, and build up a summary description or 'picture' for each group.

Cluster Groups Window

The cluster groups have been formed by grouping together individuals with similar responses to the variables. Not everyone in a given group has responded in exactly the same way, but there will be overall biases displayed by one group in contrast with another. You should interpret these biases and try to understand why these individuals have been grouped in this way. The report displays the following figures: i) Standard Deviations from the mean (Mean for Group - Mean for Sample) Standard Deviation for Sample ii) Absolute Deviations from the mean Mean for Group - Mean for Sample iii) Absolute Mean Mean for this Group on this variable iv) Variance breakdown A breakdown of the remaining

variance by group and by variable and a breakdown of the variance explained by variable.

In all cases large (positive) numbers show high agreement, low (negative) numbers mean high disagreement.

It is recommended that you use Standard Deviation from the mean to interpret the cluster groups. The numbers given by this statistic are the biases for this cluster group (compared with the overall sample) which are standardised into units of standard deviations. It is important to use this statistic since the analysis has used standardised data when forming the groups and is comparable between variables (the same deviation is just as meaningful on one variable as it is on another). This window shows overall mean (average) and standard deviation (measures degree of spread in answers) for total sample (target population). This information can be useful for identifying statements with highly-skewed

37

Page 39: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

distributions. All variables are normalised to a mean of 0 and a standard deviation of 1 before the cluster analysis runs - this ensures that each variable is given equal importance in the analysis. The average score will be 3 based on "Definitely Disagree" scoring 1, through to "Definitely Agree" scoring 5. This gives you an indication of which statements have a more positive (or negative) response. The Absolute Deviation shows the average variation from the overall mean in respondents’ answers.

Colour Coding The cells in the report are colour-coded to help interpretation. Red numbers represent a positive deviation whereas blue numbers represent a negative deviation. In both cases a light colour represents a large deviation, whereas a darker colour represents a smaller, but still important, deviation. Light Red Positive deviation greater than 1 standard deviation from the

mean Dark Red Positive deviation between 0.6 and 1 standard deviations

from the mean Dark Blue Negative deviation between 0.6 and 1 standard deviations

from the mean Light Blue Negative deviation greater than 1 standard deviation from

the mean

38

Page 40: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

How many Cluster Groups should I choose?

Choosing the most appropriate cluster solution to use is an art, and the decision should be based on a number of things.

♦ Look at each of the cluster groups in detail, taking them back into Choices to

work out the demographic characteristics of each group. Now see which groups make sense and which are interesting. Compare this with other cluster solutions to see which interesting groups you have gained or lost. Does the analysis meet the original aims, or should you consider using a different target or a different set of variables?

♦ In going from say 5 groups to 6 groups, what tends to happen is that you

maintain the 5 groups and gain a new group, i.e. the new solution will not change dramatically. You might also lose one group and gain 2 new ones. Try to identify which groups are new and which are lost. Ask yourself if the new groups are useful, or whether you have lost a very interesting group.

♦ The sample size of the smallest group is important since this will restrict the

detail of further analysis that can be sensibly performed on that group in Choices. If you were interested in examining brands with small penetrations, you should use fewer groups but with larger sample sizes in each group.

Select 'View/Cluster report' to see information generated about the analysis, i.e.

* The variance explained as a percentage * The sample size of the smallest group in the solution * The sample size of the largest group in the solution

The number of groups from 2 up to the maximum specified for the run

The % variance in the total sample explained by splitting the sample into the given number of groups

The % of variance explained indicates how much of the original variance in the data is explained by splitting the respondents into 2,3,4,5 groups etc. The level of variance achieved will vary depending on the nature of the data being analysed. For example, a larger sample size and a larger number of variables will tend to give smaller levels of explanation. Guidelines to work with are that 15% is the minimum acceptable and it is rare to ever get above 30%. Size of

39

Page 41: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

smallest and largest groups are shown as a summary. You don’t want a very small group, or a very large one. Other considerations when selecting a cluster solution to look at further include practical reasons, such as what the cluster groups will be used for. For example, if the analysis is to be presented to a large audience, how many groups can the audience cope with? 12 is probably too many.

The more cluster groups you have, the more variance is explained (which is good). However, the variance explained by additional cluster levels will rise by a diminishing amount each time. Large numbers of cluster groups can become unmanageable, and may yield low sample sizes.

40

Page 42: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis Taking your Clusters back into Choices3

Once an analysis has been run the next step is usually to take the resultant groups back into Choices for further interpretation. They can be crosstabulated against anything else in the survey. In order to do this you need to tell the system which groups to make available for analysis in Choices. This is not done automatically since generally only one solution is needed, and most analyses will generate several possible solutions.

Choices3

From the Choices 3 coding window select Tools/Import Cluster solution.

Choose the cluster solution(s) you wish to import into the Choices3 dictionary. NB The files for the cluster solutions are saved with the data, so once imported everyone who accesses Choices3 on you system is able to view the solutions you import.

Once imported, the groups in the solution may be selected like any other variable in the survey. (See previous screen)

41

Page 43: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

Clusters can now be crosstabulated against anything else in the survey. They can also be set up as targets in media analysis.

What if my cluster isn't listed? Ensure that you are in the same survey that the cluster was generated in. Choices displays the name of the survey at the top of the screen.

How do I change the headings of my clusters? You may wish to give each cluster a name that embodies the characteristics of its members. Once you have added the groups to your spec you may rename the cluster groups using the edit table icon on the Choices toolbar. For further analysis you may find it helpful to save a definition file with the group names edited.

42

Page 44: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

Overlaying your cluster solution onto the original map

It is helpful to see where the groups lie in relationship to each other on the original Correspondence Map. This may also help in deciding which of the cluster solutions you are going to use for targeting purposes. You may find that one group is a more refined version of another group and hence will respond to the same targeting strategy The cluster groups generated from the example of the Shoe Market have been overlaid onto the original correspondence map. This is done by re-opening the original crosstab with brands vs attitudinal statements, and adding the cluster groups as extra columns. Run the Correspondence Map as before. It is very important at this stage to make your cluster groups passive. This is done before going through the process of ‘tidying’ up the Correspondence Map as before.

43

Page 45: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

An example TGI Cluster Analysis: The Shoe Market

1) Crosstab Set a filter of “Bought shoes in last 12 months”. This can be done by coding all the shoe retailers together. With the brands of shoe shop in the answer panel, right click and select sample and weighted from the context menu. Then click on the word sample to sort by sample size to identify any brands with a low sample size. These should not be used in the analysis, and can either be deleted or combined with other brands if appropriate. Add the shoe shop brands to the columns and add ‘Any Agree’ Lifestyle statements to the rows. There is a definition file set up which you can use or alternatively you can select the lifestyle statements from the dictionary yourself. Save the crosstab as a spec file. You may find it beneficial to use the same file naming convention for the spec file, correspondence map and cluster project. Run the crosstab. Analyse the correspondence map to see whether it gives a good representation of the market. When you are satisfied with the map, print it off using "File/Print". Print the lifestyle statements needed for your cluster project from the row statistics view.

2) Selection of Lifestyle Statements Within Choices, set up a filter of "Bought shoes in last 12 months". Select 'Tools/Save Cluster Filter file' (ctrl + D) to set up as a cluster filter. After saving the filter in the Coding Window you may proceed to the Cluster Analysis module. Alternatively the Cluster Analysis module may be opened by means of a separate shortcut under Start/Programs/ . Choose the correct survey database. This is the same as the survey you are using in the coding window. Give the project a title i.e. ‘Shoe Market Place’ and a Name i.e. ‘Shoes’. From the list of lifestyle statements on the left-hand side of the screen, select from your printout the top 15-20 statements that you have previously identified from the row statistics. Now check that all the statements chosen (i.e. in the right-hand column) are relevant to the market chosen. If you think they are not relevant, or are too similar to another statement, click on the left-arrow button to remove them from the list. You may decide to include statements that were not so important in terms of Chi Distance on the Correspondence Map.

44

Page 46: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

3) Run Cluster Analysis

Click on the button, and select how many cluster solutions to generate. Since the software will generate every cluster solution up to and including the one you select, it is better to choose too many rather than too few. Typically, any number between 4 and 8 might be chosen but this will vary on the market you are looking at.

4) Interpreting the Results Once the cluster analysis has run, there are some statistics you should look at within the cluster software which will help you to understand the cluster groups. These are explained within the 'Interpreting the results' section on p36. However, the "View Cluster Solution" option will be of particular interest. By looking at the top 5 or so statements in each group of the solution, you can begin to get a feel for how the cluster groups vary attitudinally from each other. Those marked with a negative sign mean the respondents don't agree with the statement. You may find it helpful to print the list of lifestyle statements for each of the cluster groups for each solution generated. The lifestyle statements are ranked in descending order for every group. It can be helpful to give the cluster groups names. We are looking to identify the cluster groups as people we may see in the street. Within the Cluster software examine each of the groups in each cluster solution. CLUSTER 1 CLUSTER 2 CLUSTER 3 CLUSTER 4 Friends more important than family

Spend a lot on clothes

If looking for bargains, look in local paper first

Prefer herbal medicine products

If looking for bargains, look in local paper first

Like to keep up with latest fashion

Like to take holidays in Britain rather than abroad

Prefer alternative medicine (e.g. acupuncture)

Interested in financial services advertising

Wear designer clothes

When household shopping budget for every penny

Read financial pages of newspaper

Like to take holidays in Britain rather than abroad

Like to stand out in a crowd

Skincare products are for women not men

Have classic dress style

Prefer herbal medicine products

Can’t resist expensive perfume/aftershave

Only beer worth drinking is real ale

Only beer worth drinking is real ale

5) Import the Clusters back in to Choices3 One of the most powerful ways of interpreting the cluster groups is to import them back into Choices for further analysis. First, import your chosen cluster group(s) into Choices by selecting Tools/Import cluster solution on the Choices3 Coding Window menu bar.

45

Page 47: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis What you choose to look at will vary by the market you are looking at, but typically you might want to know the following: i ) Demographic Profile CLUSTER 1 CLUSTER 2 CLUSTER 3 CLUSTER 4 Social Grade DE, Age 65+ or <24, Not working

Age 15-34, Social Grade ABC1, Working F/T

Social Grade DE, Age 55+, Not working

Social Grade AB, Age 35-64, Work P/Time

ii) Press Profile CLUSTER 1 CLUSTER 2 CLUSTER 3 CLUSTER 4 Weekly News More! Heritage Today Times Educational

Supplement Angler’s Mail Mizz That’s Life The Independent Match Now My Weekly Birds Magazine Angling Times Hair TV Choice The Guardian Woman’s Own Smash Hits Take A Break Marks & Spencer

Magazine Inside Soap Kerrang Chat AA Magazine i ii) Brand consumption CLUSTER 1 CLUSTER 2 CLUSTER 3 CLUSTER 4 Littlewoods Ravel Trueform Marks & Spencer Trueform Dolcis Shoe Express House of Fraser Freeman Hardy Willis

Next Timpson/Oliver Clarks

K Shoes Russell & Bromley Freeman Hardy Willis

John Lewis

Stead & Simpson Bally Clarks BHS Shoe Express House of Fraser Stead & Simpson K Shoes i v) TV Programmes CLUSTER 1 CLUSTER 2 CLUSTER 3 CLUSTER 4 Family Affairs Streetmate Family Guy Gardener’s World Doctors The Priory Stargate Horizon Wheel of Fortune CD UK Driven Timewatch That’s Esther She’s Gotta Have It Top Gear Dispatches City Hospital Dawson’s Creek Robot Wars Panorama Trisha Hollyoaks F1 Grand Prix Watchdog

46

Page 48: CHOICES 3 - C&C Manual

The Statistics Explained

Cluster Analysis If the population distribution in a sample is not homogenous, respondents often clump together in clusters. Gaps may also indicate that there is a mixture of several displaced distributions. Since clusters are highly dependent on the sampling variation, small perturbations in the data might lead to very different clusters. The choice of the number of clusters cannot follow from the algorithm, but has to be made subjectively. For these reasons, cluster analysis is not a rigorous and sharp statistical tool, and should be applied after careful consideration and scrutiny of all the information available.

Cluster Methodology The clustering process begins with comparing the distance of each observation from the mean vectors ('Centroids') of each of the proposed clusters in the sample of n observations. The observation is assigned to the cluster with the nearest mean vector. The distances are recomputed and reassignments are made as necessary. The process continues until all observations are in clusters with minimum distances to their mean vectors.

K-Means Algorithm The cluster analysis program uses a K-Means partitioning algorithm. A partitioning algorithm moves from a smaller number of groups to a larger number of groups, as opposed to a joining algorithm, which does the reverse.

The program works up from an initial starting point of one group (the target population), then builds a 2-cluster solution, a 3-cluster solution and so on up to a maximum number of clusters specified by the user. This is known as a 'Multi-K-Means' analysis, where 'K' refers to the number of clusters chosen by the user. The starting partition at each instance is derived by splitting an existing group into 2. The program selects and splits the group with the greatest variance when moving from level to level.

Squared distances are taken for every individual. The individual with the highest score is taken, and then the person who is the most dissimilar.

i) Standardise Variables Variables are standardised (normalised) to a mean of 0 and a variance of 1, which ensures that each variable is given an equal weighting in the analysis.

ii) Split target population into two groups Two seed points are selected. Point A is chosen as the point furthest away from the centroid of the group to be split, then Point B is chosen as the point furthest away from point A. The remaining points are then split between these two seeds.

iii) Refine cluster solution and perform K-Means For each respondent, and for each cluster other than the respondent's current cluster, the program calculates the increase in error due to the transfer. If the minimum increase in error is negative, the respondent is transferred to the minimal cluster. The cluster centres of all losing and gaining clusters are now readjusted, as any increases in error are recorded. Data passes are then repeated until no further data cases can be moved.

iv) Number of groups is smaller than number required

Page 49: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

If the number of groups is still smaller than the number required, the group with the largest variance is split, and phases (ii) and (iii) are repeated.

Mean Absolute Deviation Each observation is calculated in terms of its deviation from the mean. The resultant deviations are then summed, and the mean of these is calculated.

Variance The variance is the average of the sum of squared deviations. The smaller the variance in the population, the more accurate will be a sample taken from that population.

Standard Deviation Once squared, the square root of the variance is taken as a measure of dispersion.

Further reading:

Hartigan, J.A (1975) Clustering Algorithms (Wiley, New York) Everitt, B (1974) Cluster Analysis (Heinemann, London) MacQueen, J (1967) "Some methods for classification and analysis of multivariate observations", Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley

48

Page 50: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis FILE New Project Creates a new cluster project and makes it the active window. You will be

prompted to save the document when you close the application. Open Project Displays the Open File dialog box, so you can select a file to load into a new

document window. You can also create a new document by naming a file that does not currently exist.

Close File Closes the currently active window. Close Project Takes you back to the initial screen where you can start a new cluster project or

work on an existing cluster project. Save Project Saves the document in the active window. If the document is unnamed, the

Save As dialog box is displayed so you can name the file, and choose where it is to be saved.

Save As Allows you to save a document under a new name, or in a new location on disk.

The command displays the Save File As dialog box. You can enter the new file name, including the drive and the directory. All windows containing this file are updated with the new name. If you choose an existing file name, you are asked if you want to overwrite the existing file.

Print This prints the contents of the active window. Use File/Print Preview to see how

the document will be laid out on printer pages. Use File/Print Setup to select a printer, and to set printer options.

Print Preview This opens a special window that shows how the active document will appear

when printed. The preview window shows one or two pages of the active document as they would be laid out on printed pages. Controls on the window allow you to page through the pages of the document.

Print Setup This displays the Printer Setup dialog box, which allows you to select and

configure the printer to be used. Import filter Here you can upload a filter that was saved in Choices 1. Exit This takes you out of the Cluster program. Make sure that you have saved your

file first. EDIT Copy Enables user to copy any highlighted data and paste into documents such as

Word, Excel etc VIEW Project Displays the active project window. Summary statistics For each variable (statement) shows mean and standard deviation. Main Report For each solution, shows the amount of variance and the size of the smallest

and largest groups in each solution. Allows you to view any individual cluster solution.

Cluster Information Displays general information on the cluster analysis. Cluster Report For each solution, shows the amount of variance and the size of the smallest

and largest groups in each solution. Cluster Solution Displays, for a given solution, each group; shows the standard deviation from

the mean, absolute deviation from the mean and absolute mean for each statement.

Cluster Group Enables user to view the different groups within a cluster solution. Cluster Log Provides a record of how each cluster group was arrived at, with the number of

passes and points moved.

49

Page 51: CHOICES 3 - C&C Manual

Correspondence & Cluster Analysis

50

ANALYSIS Run Cluster Displays the cluster runtime options. Create membership data Choices 1 - creates a file that you can load into Choices 1. Create membership data Choices 2 - creates a file that you can load into Choices 2. Add Variable Enables user to add an additional variable to analysis. Remove Variable Enables user to remove an unwanted variable from analysis. OPTIONS System Options This shows the directories used by Choices, and may be useful if you want to

know where certain files are stored. Project options Lets you change the title of the project and the number of clusters. Save Project Lets you save the above options. WINDOW Cascade Displays all the available windows in overlapped form, so that the title bar of each

is visible. Tile Displays all windows on the same screen in a non-overlapping arrangement. Arrange Icons Arranges all iconized windows into rows along the bottom of the applications main

window. Close All Lets you close all windows, report windows, cluster group windows or cluster

solution windows.

HELP Help Provides help on running a cluster analysis.