Unit 3 Further Maths - School For Excellence · Measurement, Graphs and ... presentation and interpretation of data. TYPES OF DATA ... The number of students late for a Further Maths

Unit 3 Further Maths

important notes Our policy at TSFX is to provide students with the most detailed and comprehensive set of notes that will maximise student performance and reduce study time. These materials, therefore, include a wide range of questions and applications, all of which cannot be addressed within the available lecture time i.e. Due to time constraints; it is possible that some of the materials included in this booklet will not be addressed during the course of these lectures.

Where applicable, fully worked solutions to the questions in this booklet will be handed to students on the last day of each subject lecture. Although great care is taken to ensure that these materials are mistake free, an error may appear from time to time. If you believe that there is an error in these notes or solutions, please let us know asap ([email protected]). Errors, as well as clarifications and important updates, will be posted at www.tsfx.com.au/vce-updates.

The views and opinions expressed in this booklet and corresponding lecture are those of the authors/lecturers and do not necessarily reflect the official policy or position of TSFX.

These materials are the copyright property of The School For Excellence and have been produced for the exclusive use of students attending this program. Reproduction of the whole or part of this document constitutes an infringement in copyright and can result in legal action. No part of this publication can be reproduced, copied, scanned, stored in a retrieval system, communicated, transmitted or disseminated, in any form or by any means, without the prior written consent of The School For Excellence (TSFX). The use of recording devices is STRICTLY PROHIBITED. Recording devices interfere with the microphones and send loud, high-pitched sounds throughout the theatre. Furthermore, recording without the lecturer’s permission is ILLEGAL. Students caught recording will be asked to leave the theatre, and will have all lecture materials confiscated.

copyright notice

extract from the master class teaching materials Our Master Classes form a component of a highly specialised weekly program, which is designed to ensure that students reach their full potential (including the elite A and A+ scores). These classes incorporate the content and teaching philosophies of many of the top schools in Victoria, ensuring students are prepared to a standard that is seldom achieved by only attending school. These classes are guaranteed to motivate students and greatly improve VCE scores! For additional information regarding the Master Classes, please do not hesitate to contact us on (03) 9663 3311 or visit our website at www.tsfx.com.au.

essential for all year 11 and 12 students!

succeeding in the vce, 2017

it is illegal to use any kind of recording device during this lecture

TSFX – voted number one for excellence and quality in VCE programs.

© The School For Excellence 2017 Succeeding in the VCE – Unit 3 Further Maths Page 1

UNITS 3 AND 4: FURTHER MATHEMATICS

COURSE OUTLINE

There are two areas of study: 1. Core material – Data Analysis and Recursion and Financial Modelling.

2. Module material – Two selected modules from four available. There are four modules: Matrices, Networks and Decision Mathematics, Geometry and Measurement, Graphs and Relations.

ASSESSMENT 1. School-Assessed Coursework:

This is worth 34 percent of your total assessment.

Assessment Task Mark Allocations

Unit 3

Data Application task 40

Modelling/Problem Solving

Task 1 – Recursion and Financial Modelling

20

Unit 4


Task 2 (First module)

20


Task 3 (Second module)

20

TOTAL 100

2. Two End-of-Year Examinations:

This is worth 66 percent of your assessment: Each exam contributes 33 percent.

Examination 1: Facts and Skills – Multiple Choice

Examination 2: Analysis Task


Statistics is a mathematical science. It involves collection, organisation, presentation and interpretation of data.

TYPES OF DATA Univariate data: Investigates one variable in isolation. Bivariate data: Investigates two variables and any possible relationship between them.

TYPES OF VARIABLES

NUMERICAL VARIABLES Numerical variables represent quantities. They have numerical values. They are measured or counted. Numerical variables can either be continuous or discrete. A continuous variable can take any value within a range, although the range of values may be restricted. Continuous variables are usually measured. Examples: Height, weight. A discrete variable can take only certain distinct values in a given range. Discrete variables are usually counted, i.e. 0, 1, 2, 3, etc. Examples: Number of children in a family (0, 1, 2, 3, etc.), (you cannot have 1.43 sisters!), number of goals in netball, etc.

TYPE OF

DATA

UNIVARIATE

ONE VARIABLE

BIVARIATE

TWO VARIABLES


CATEGORICAL VARIABLES Categorical variables represent qualities. Their outcomes are not measured or counted. The answer to the statistical question is a usually a word rather than a number. Examples: Blood groups (A, B, AB, and O), gender (female or male). Categorical variables can be further separated into two types, nominal and ordinal. Nominal categorical variables are variables that have names. The options for the variable have no natural order. An example is eye colour (blue, brown, green, etc.…). Ordinal categorical variables have a natural order such as salary (high, medium or low). Sometimes numerical values or scores can be assigned to categorical data, for example such as ‘tidies up his/her room’ (1 = regularly, 2 = sometimes, 3 = rarely, 4 = never) or “Further Maths is the best subject ever” (1 = strongly disagree, 2 = disagree, 3 = undecided, 4 = agree, 5 = strongly agree). These numbers can be used to calculate an average or mode. For example, the average for “Further Maths is the best subject ever” could be 4.2. This number does not in itself mean anything, but it tells the researchers that many people agree or strongly agree with the statement. Numerical values allocated in this way are artificial and therefore such variables are still considered categorical.

SUMMARY OF TYPES OF DATA

The types of data collected can be summarised as follows:

E.g. Eye colour E.g. Salary E.g. Number of siblings E.g. Height H, M, L

TYPE OF

DATA

or

SCORE, x

CATEGORICAL

NOMINAL

LABEL that has no order

ORDINAL

LABEL that can be placed in a logical order

NUMERICAL

DISCRETE

USUALLY WHOLE NUMBERS obtained by

COUNTING

CONTINUOUS

ANY FORM of NUMBERS obtained from MEASURING


QUESTION 1 Classify each of the following variables as numerical or categorical. If the variable is numerical, further classify the variable as discrete or continuous and if it is categorical, further classify the variable as nominal or ordinal. (a) Length of a movie in minutes. (b) Length of a movie (short, medium, long). (c) The grade given to a Further Mathematics student for a SAC (A, B, C, D, etc.). (d) Hair colour (black, brown, blond, grey and red). (e) Days in the week. (f) Age in years. (g) Australian state of residence. (h) Polishes his/her school shoes (1 = regularly, 2 = sometimes, 3 = rarely, 4 = never). (i) Number of members in a family. (j) Employment status (full-time employed, part-time employed, unemployed). (k) Height in cm. (l) The number of trees in a park.


QUESTION 2 Which one of the following is an example of continuous numerical data? A The score on a multiple choice test of 20 questions. B The height of trees at a nursery. C The number of suitcases lost by an airline company. D The number of blue M&Ms in a bag. E Voters classified as low income, middle income and high income. QUESTION 3 The number of students late for a Further Maths class recorded over a month was as follows:

0, 2, 4, 2, 3, 1, 0, 0, 4, 3, 2, 2, 3, 1, 7, 2 The type of data is: A Numerical discrete. B Numerical continuous. C Categorical ordinal. D Categorical nominal. E Categorical discrete. QUESTION 4 The following table shows the examination marks and gender for 25 students for a statistics test.

Table: Examination marks and gender of 25 children.

Female marks 81 72 47 69 21 98 67 60 78 99 34 89 85

Male marks 59 80 34 27 88 76 56 98 45 95 73 70

The variables, examination marks and gender of children are: A Both numerical variables. B Both categorical variables. C Neither numerical nor categorical variables. D Numerical and categorical variables respectively. E Categorical and numerical variables respectively.


QUESTION 5 The following table shows the preferred subject and preferred hand for 120 individuals. Table: Preferred subject and preferred hand for 120 individuals.

Preferred Hand Preferred Subject Left Right English 7 49 Maths 13 51

(a) The variables, preferred subject and preferred hand are:

A Both numerical variables. B Both categorical variables. C Neither numerical nor categorical variables. D Numerical and categorical variables respectively. E Categorical and numerical variables respectively.

(b) Of the individuals surveyed, the percentage who were left handed and preferred maths

is closest to:

A 9% B 11% C 13% D 43% E 53%

(c) Of the individuals surveyed, the percentage who preferred maths is closest to:

A 9% B 11% C 13% D 43% E 53%


HISTOGRAMS A histogram is a graphical representation of a numerical data distribution. Useful for displaying large data sets (50+ observations). Vertical axis displays frequency (count or percent). Horizontal axis displays values of the variable. The height of the bar gives the frequency (count or percentage). Discrete data: The width of the bar corresponds to the discrete data values, and the data value is central to each bar to reflect the discrete nature of the data. Often the bars are separated, but they can be touching each other. Continuous data: The width of each bar corresponds to data class interval. There are no gaps between bars i.e. the end of one rectangle must be the beginning of the next rectangle, unless there is a class interval with a frequency of zero. Note: If the frequency of a discrete value (or continuous data interval) is 0, then the bar has zero height.

EXAMPLE The following is a survey of 30 students of the number of people in their family living at home.

2 4 5 6 4 3 4 3 2 5 6 6 5 4 5 3 3 4 5 8 7 4 3 4 2 5 4 6 4 5 Display the information as a histogram. Solution 1. Identify it is numerical discrete data with

range of less than 15. Label the x-axis as “Number of members in a family”, leave a space at the beginning and scale the x-axis from lowest to highest score. Label the score in the middle of each column.

2. Label the y-axis as “frequency” (or “number of families”).

3. Draw in the first column in the middle of the score on the x-axis and a height to the value of its frequency taken from y-axis.

4. Repeat for all other scores.

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 9

Nu

mb

er o

f fa

mil

ies

Number of members in a family

Histogram of Members in Families.


USING A TI-Nspire CALCULATOR Add a Lists and Data Spreadsheet to your document. Use Control →I→ 5 to add a Data and Statistics page. Enter the values into List 1 and name the data in the title box as shown

On the Data and Statistics page, click on add variable to add the variable “family”. The default display is a dotplot as shown.

Menu → 1. Plot type → 3. Histogram will produce the display.

Menu → 4. Analyze → A.Graph trace allows you to trace the peaks for frequencies


USING A CLASSPAD CALCULATOR Use Stat menu and enter data into list 1. SetGraph→Setting→set as shown→Set . If the data was in a frequency table the setting would be XList→List 1, Freq→List 2.

Press draw graph symbol

and then set interval as shown. This setting starts the histogram at 0 and goes up in ones. Press OK. The graph is now shown.

Use Analysis→Trace to display the frequency of each peak.


QUESTION 6 Construct a histogram from the frequency table you obtained from the following data.

1 5 2 5 5 4 3 7 6 5 4 6 4 0 1 2 5 6 5 3 5 8 7 4 5 Solution


QUESTION 7 Construct a histogram from the frequency table you obtained from the following data.

820.72 821.35 821.42 822.57 820.70 821.24 823.16 822.31

824.61 821.92 821.74 820.23 821.99 822.31 823.14 820.62

822.71 822.91 822.76 821.08 820.75 822.76 819.99 820.31

821.62 823.27 820.37 820.62 825.72 822.32 821.27 822.07

820.94 822.06 821.82 823.48 820.47 822.83 819.76 821.57

Solution


QUESTION 8 The distribution of speeds of vehicles on a freeway is displayed in the frequency histogram. The speed limit was 100 km/h.

Assume a motorist is speeding once they exceed 100 km/h. According to the graph the percentage of vehicles speeding was closest to: A 25% B 33% C 53% D 67% E 78%


HISTOGRAMS USING LOGARITHMIC SCALES Many real world values are recorded using a logarithmic scale. Data displays, such as histograms, using a logarithmic scale are a requirement of the Further Maths course. This course will use logarithms to base 10, written as log . A logarithm reduces a value to a power of 10, so it can be used to compare very small and very large values in the one data distribution.

Examples of real world logarithmic scales are the Richter scale used to measure the strength of earthquakes, decibels used to measure sound intensity and the pH scale used to measure acidity of substances.

A logarithm of a number is simply the power that the number 10 would be raised to, in order to get that number. The table below shows the relationship between simple numbers, the same number written as a power of 10 and the equivalent logarithm.

Number Power of 10

0.001 10 3

0.01 10 2

0.1 10 1

1 10 0

10 10 1

100 10 2

1000 10 3

10 000 10 4

EXAMPLE The data below shows the approximate population of 30 different countries.

China 1 374 390 000 India 1 283 230 000

US 322 230 000 Indonesia 258 705 000

Pakistan 192 540 000 Russia 146 505 000

Japan 126 880 000 Egypt 90 251 000

France 67 286 000 Italy 60 685 000

South Korea 51 529 000 Australia 23 974 000

Sri Lanka 20 966 000 Netherlands 16 984 000

Sweden 9 845 000 Austria 8 663 000

Switzerland 8 306 000 Denmark 5 699 000

Singapore 5 535 000 Slovakia 5 424 000

New Zealand 4 653 000 Jamaica 2 723 000

East Timor 1 167 000 Luxembourg 563 000

Malta 445 000 Bermuda 62 000

Monaco 37 800 Norfolk Island 2300

Vatican City 839 Pitcairn Islands 56


A histogram of these values would not be very informative because the range of populations is very wide. The histogram using the original populations is shown below:

An alternative to using the original populations would be to use the logarithm of the original populations. This histogram is shown below:

This histogram is much more informative. Remember when interpreting this histogram that:

Values with a log between 1 and 2 are populations between 10 and 100

Values with a log between 2 and 3 are populations between 100 and 1000

Values with a log between 3 and 4 are populations between 1000 and 10 000

Values with a log between 4 and 5 are populations between 10 000 and 100 000

Values with a log between 5 and 6 are populations between 100 000 and 1 000 000

Values with a log between 6 and 7 are populations between 1 000 000 and 10 000 000

Values with a log between 7 and 8 are populations between 10 000 000 and 100 000 000

Values with a log between 8 and 9 are populations between 100 000 000 and 1 000 000 000

Values with a log between 9 and 10 are populations between 1 000 000 000 and 10 000 000 000


USING A CALCULATOR TO OBTAIN LOG VALUES

Casio ClassPad TI-Nspire

An individual log value can be found on the keyboard under Math1, shown circled below. When this is selected “log(10,” appears. The required value and the end bracket must be entered.

An individual log value can be found on the keyboard using ctrl → 10X

When this is selected appears. The required base and value must be entered.

If you have a list of values in Statistics menu, you can calculate a list that is the log of all values. The calculation is done in the Cal line at the bottom of the list. Type log(list1) and EXE.

If you have a list of values in Lists and Spreadsheets, you can calculate a list that is the log of all values. The calculation is done in the calculation line under the list title. Type log10(variable name) and enter.


QUESTION 9 A histogram is shown below of the logarithm of Gross Domestic Product (GDP) in billions of dollars for 193 countries.

The percentage of countries with a GDP greater than $1 000 billion dollars is closest to: A 1% B 2% C 8% D 32% E 62% QUESTION 10 Afghanistan has a GDP of 20.8 billion dollars. On this histogram, the value for Afghanistan would be included in: A The bar between -1 and 0. B The bar between 0 and 1. C The bar between 1 and 2. D The bar between 2 and 3. E The bar between 3 and 4.


QUESTION 11 The following data shows the approximate population of some capital cities:

Beijing 20 693 000 Tokyo 13 189 000

London 8 630 000 Bangkok 8 249 000

Hanoi 7 088 000 Ankara 5 150 000

Berlin 3 520 000 Paris 2 241 000

Ottawa 898 000 Kathmandu 812 000

Canberra 355 000 Noumea 89 000

Port Vila 38 000 Phillipsburg 1300

Kingston 880 Adamstown 45

Calculate the logarithm of each of the population values and use these to construct and label a histogram using a logarithmic scale.


THE NORMAL DISTRIBUTION

All normal distributions are symmetric and have bell-shaped density curves with a

single peak. The mean is where the peak of the density occurs (the centre) and the standard

deviation indicates the spread (the height and width) of the bell curve. If the standard deviation is large the curve is short and wide: when the standard

deviation is small, the curve is tall and narrow. In a bell shaped (normal) distribution the Mean = Median = Mode. The following facts hold for a normal distribution:

68% of observations lie within 1 standard deviation of the mean; that is, between sx x sx .

95% of observations lie within 2 standard deviations of the mean; that is, between

sx 2 x sx 2 .

99.7% of observations lie within 3 standard deviations of the mean; that is between sx 3 x sx 3 .

This is known as the 68 – 95 – 99.7 rule. Thus, for a normal distribution, almost all data lies within 3 standard deviations either

side of the mean. We can use these properties to predict what percentage of a given set of data lies 1, 2

or 3 standard deviation units from the mean.


EXAMPLE The heights of some genetically modified plants, grown in a laboratory, have a normal distribution with a mean of 62 cm and a standard deviation of 3 cm. Approximately 95% of the plants will have what height range? Solution Middle 95% is two standard deviations on either side of the mean. Range of heights will be between:

cmandcm 68563262

EXAMPLE The volume of a particular tomato juice carton is normally distributed with a mean of 250 ml and a standard deviation of 5 ml. In a sample of 400 cartons, how many would be expected to have a volume of: (a) More than 245 ml? (b) Less than 240 ml? (c) Between 240 and 260 ml? Solution Draw a normal distribution graph, insert the values for the distribution at the appropriate places, and then evaluate the answer(s).

2555250 sx 260102502 sx 265152503 sx

2455250 sx

240102502 sx 235152503 sx

235 240 245 250 255 260 265

(a) sxml 245

%84%15.035.25.133434245 X 84% of 400 = 336 cartons (b) 240 – 2ml x s

240 2.35 0.15X 2.50% 2.5% of 400 = 10 cartons

(c) 240102502 sx 260102502 sx 240 260 2 13.5 34 95%X

95% of 400 = 380


QUESTION 12 The distribution of the weights of ‘free range eggs’ is approximately bell-shaped with a mean of 90 g and a standard deviation of 5 g. Free range eggs weighing more than 100 g are classified as ‘extra large’. In a sample of 5000 eggs, the number of eggs that would classify as ‘extra large’ is closest to: A 8 B 118 C 125 D 800 E 4877 QUESTION 13 The distribution of Statistics test scores formed a bell shaped curve with a mean of 65 and standard deviation of 9. The percentage of students with scores between 56 and 92 is closest to: A 36% B 50% C 64% D 82% E 84% QUESTION 14 The length of gestation of a horse is approximately bell-shaped with a mean of 336 days and a standard deviation of 3 days. The percentage of horse pregnancies that last less than 339 days would be closest to: A 5% B 16% C 68% D 84% E 95%


QUESTION 15 The diastolic blood pressure in hypertensive women is approximately bell-shaped with a mean of 100 and a standard deviation of 14. (a) What proportion of women have diastolic blood pressures between 72 and 114? (b) What proportion of women have diastolic blood pressures greater than 86? (c) What proportion of women are above and below 2 standard deviations from the mean? QUESTION 16 A factory produces bags of rice. The weights of bags are normally distributed with a mean of 900 g and a standard deviation of 50 g. What is the best approximation for the percentage of bags that weigh more than 1 kg? A 0% B 1.25% C 2.5% D 5% E 16%


STANDARD Z-SCORES Known as standard score, standardised score, z–score or normal score.

It is an indication of how many standard deviations an observation is above or below the mean, i.e. how far and in what direction.

If an observation’s z-score is negative it lies below the mean, and if an observation’s z-score is positive it lies above the mean.

To calculate a standard z-score, you use the following rule:

deviation standard

mean) valuedata (actual z

EXAMPLE Amber and Sacha are two basketball players. The following table lists their points scored in the ten games for the season:

Game Amber Sacha

1 12 6

2 12 8

3 16 12

4 16 2

5 10 10

6 19 2

7 17 12

8 18 10

9 16 7

10 14 8 In Game 5 both Amber and Sacha scored 10 points. Does this mean their personal performances are equally as good? Solution

To compare the two basketball players you must calculate their z-scores.

Amber has a mean score of 15 with a standard deviation of 2.9. Hence her z-score is:

z x

10 15

2.9

1.72

Amber’s z-score for Game 5 is 1.72 standard deviations below her average.

Sacha has a mean score of 8 with a standard deviation of 3.9. Hence her z-score is:

z x

10 8

3.9

0.52

Sacha’s z-score for Game 5 is 0.52 standard deviations above her average.

Sacha’s personal performance was better than Amber’s because her z-score is higher.


QUESTION 17 In a large supermarket, the price for differing brands of milk and bread are normally distributed. The mean price for a 1 L carton of milk is $ 1.70 with a standard deviation of 40 cents. The average price for a loaf of bread is $2.80 with a standard deviation of 65 cents. If I pay $1.83 for a particular carton of milk and $3.02 for a loaf of bread, which one is relatively more expensive? Solution


QUESTION 18 Scores for children on a particular intelligence test are known to be normally distributed with a mean score of 100 and a standard deviation of 15. Students are accelerated if they receive a score of 130 or above. (a) What percentage of students would be accelerated? (b) What is the minimum z-score a child must obtain to be accelerated?


QUESTION 19 A student obtained 65% on both her Physics and her French exam. She was pleased with her Physics result, but disappointed in her French result. The mean score on the Physics exam was 62 and the standard deviation was 10. The mean score on the French exam was 72 and the standard deviation was 6. Discuss why she might be feeling like this. Give reference to her z-scores for each subject and give a rating of her performance in each subject assuming a normal distribution of marks and using the 68 – 95 – 99.7% rule. Solution QUESTION 20 A mother is worried that her two-year-old son is not growing adequately. He has a height of 60 cm. Heights of two-year-olds are normally distributed with a mean of 70 cm and a standard deviation of 8 cm. (a) Calculate the son’s standardised score for height correct to two significant figures. (b) Interpret the meaning of his standardised score in terms of comparing his height to the general population.


SAMPLES AND POPULATIONS A population includes every possible data value for the defined population. A defined population can be quite small (for example every student in a particular class) or very large such as the census of Australia. In statistics, samples are often used, particularly when there is a large population, as it can be easier to obtain a sample. If a sample is used it should be carefully selected to ensure there is no bias. This will make it most likely that the sample is indicative of the whole population. Different symbols are used for population parameters and sample statistics. The mean for a sample is ̅, but the mean for a population is given the symbol . The sample standard deviation is or but the population standard deviation is calculated differently and has the symbol . Population statistics tend to be relatively fixed, but sample statistics can be variable. Random numbers are often used to select a sample to use for a survey. Using random selection aims to make sure that the sample is indicative of the population. If there is a significant difference between population and sample statistics that may indicate that the sample may have been biased in some way. Usually when a random selection is made the steps would be: 1. Determine the number of total available data pieces and then allocate each piece of data a value from 1 onwards. 2. Use a random number generator to randomly select the number of data pieces

required. The random number generator could be on your calculator or it could be a random number table as shown below. If one or more data pieces is selected more than once, then keep making selections until the correct number of individual data items is selected.

Random Number Table 89074 45632 12342 09834 67859 19504 13980 45678 45768 19708 10967 23785 19056 15870 26437 19700 67580 12902 56479 89706 27580

EXAMPLE In a particular year level there are 64 students. Five are to be randomly selected to complete a survey about their learning. Use the random number table to do this. Solution The students should be numbered from 1 to 64. Remember that because there are 2 figures in 64 there should be 2 figures in each data value. Therefore student 1 would be 01 and Student 9 would be 09 etc. Start at the top left hand corner of the table and move across. Select the first five appropriate values. The values would be 89, 07, 44, 56, 32, 12, 34, 20, 98, 34 etc. so students 7, 44, 56, 32 and 12 would be selected.


USING A CAS CALCULATOR TO OBTAIN RANDOM NUMBERS

Casio ClassPad TI-Nspire

Main menu → keyboard → down arrow to catalog → R → randList(, the syntax is randList(required number, start, finish)

Calculator page → Menu → 5.Probability → 4.Random → 2.Integer → randInt() appears on screen, the syntax is randInt(start,finish, required number)

QUESTION 21 3 students are to be selected from a year level to present their assignment to the rest of the students. If there are 122 students in the year level and the random number table shown previously is used, starting at the top left hand corner, which 3 students would be selected? Solution


QUESTION 22 A particular VCE subject has the following population parameters; 30and 6. At one school the students have the following summary statistics; ̅ 26and 2. Give possible reasons why the statistical values at this school are different to the population parameters. QUESTION 23 The average height of 18 year old females in a particular population is 177 with a standard deviation of 4.5 . A group of 18 year old females completing VCE were surveyed to find their heights. The students surveyed were randomly selected using their student numbers and a random number generator. Give approximate values for ̅ and for this sample and explain your answer. Solution


SUMMARY

MEASURES OF CENTRE AND SPREAD The appropriate measure of centre, and of spread, depends on the type of variable and the shape of the distribution.

Type of Variable Shape of

Distribution Measure of Centre Measure of Spread

Continuous or discrete

Roughly symmetrical and unimodal

Mean Standard deviation

Continuous or discrete

Markedly skewed Median IQR

Categorical Unimodal Mode Not appropriate


BIVARIATE DATA

DISPLAYING, SUMMARISING AND DESCRIBING RELATIONSHIPS IN BIVARIATE DATA

Bivariate data examines the relationship, if one exists, between two variables.

Bivariate relationships can be seen between: Two different categorical variables. One categorical and one numerical variable. Two different numerical variables. Prior to any examination of a bivariate relationship we need to determine which variable is the explanatory variable and which variable is the response variable. If a variable y depends on a variable x , then y is called the response variable and x

the explanatory variable. For example, if the exam result depends on the amount time spent studying, then the exam result is the response variable and the amount of time spent studying the explanatory variable.

Two different categorical variables

One categorical and one numerical

variable

Two numerical variables

Explanatory variable

Needs to be decided which of the two variables

Explanatory variable is ALWAYS the categorical variable

Needs to be decided which of the two variables

Organising data

Two-way frequency table

Frequency table or back to back stemplot

Table

Display Segmented bar chart

Parallel boxplots Scatterplot

Describing relationships

Most popular groups e.g. modal group

Comparing the:

Shape of distributions

Centres Spreads Outliers

Describing

Form Strength Direction


The location of the explanatory and response variable is important when graphing. For scatterplots it is as follows. Response Variable (Exam result) To determine which variable is the explanatory and which is the response variable, the following tests can be applied: 1. Can one variable affect the other?

If yes, the variable that can be affected is the response and the one that cannot be affected is the explanatory variable.

2. Did one variable happen before the other variable? If yes, the variable that occurs first is the explanatory variable. 3. Are you trying to predict one variable from the other? If yes, the variable being predicted is the response variable. QUESTION 24 For each of the following pairs of variables, identify the explanatory variable: The height of a plant and the hours of sunlight it receives. Education level of a person and the income they receive. The ATAR score of a student and the amount of study a student does on average per

week. Temperature of water and how quickly sugar dissolves.

Explanatory Variable (Time spent studying)


TWO NUMERICAL VARIABLES A scatter plot is used to identify and describe relationships between two numerical

variables. When constructing a scatter plot, the response variable is plotted on the vertical or

y-axis and the explanatory variable on the horizontal or x-axis. Bivariate relationships and particularly two numerical data relationships must be

examined in terms of: Strength (Strong, moderate, weak or no relationship). Direction (Positive or negative) and outliers. Form (Linear or non-linear). Scatter plots can be used to give a visual estimation of the strength, direction and form

of the relationship.

Positive Linear Strength Negative Linear

Strong

Moderate

Weak

No Relationship


QUESTION 25 Create a scatterplot from the following coordinate points: (1, 49), (3, 51), (4, 52), (6, 52), (6, 53), (7, 53), (8, 54), (11, 56), (12, 56), (14, 57), (14, 58), (17, 59), (18, 59), (20, 60), (20, 61) Hence state how the relationship could best be described in terms of direction and strength. Solution


QUESTION 26 The following scatter plot shows the relationship between the examination score on a university examination and the time spent studying before the examination, in days.

This relationship is best described as: A Nonlinear. B A strong negative linear relationship. C A strong positive linear relationship. D A weak positive linear relationship. E Having no relationship.


QUESTION 27 The following scatter plot shows the price (in thousands) of Toyota Camrys and their age (in years) for 10 cars found in the ‘classifieds’.

This relationship is best described as: A A strong negative linear relationship. B A strong positive linear relationship. C A weak positive relationship. D A weak negative relationship. E Having no relationship.


MEASURING THE STRENGTH AND DIRECTION OF A RELATIONSHIP

There are several ways to measure the association between two variables, however,

Pearson’s product moment correlation coefficient, which is designated by the letter r, is the most common measure.

r can be calculated using the following formula:

r = 1

1

n

n

i 1

(x

i

s

xx ) (

y

i

s

yy )

The easiest way to find r is to use your CAS calculator and this is the method expected in the Further Maths course. Understanding the meaning of the r value and interpreting the result or analysing any changes to the r value are how this concept is extended.

When calculating r you are assuming:

The variables are numeric.

The relationship is linear.

There are no outliers. Note: If outliers present, then your calculation of r may give a misleading indication of the

strength of the linear relationship. r can only take on the values 1 r +1. A table can be used to determine the interpretation of the strength and direction of a

relationship given the r-value:

r-value Meaning

Between 0.75 and 0.99 Strong positive relationship

Between 0.5 and 0.74 Moderate positive relationship

Between 0.25 and 0.49 Weak positive relationship

Between –0.24 and +0.24 No relationship

Between –0.25 and –0.49 Weak negative relationship

Between –0.5 and –0.74 Moderate negative relationship

Between –0.75 and –0.99 Strong negative relationship


USING A CAS CALCULATOR TO DETERMINE R VALUES

ClassPad TI-Nspire

Statistics menu → enter data into two lists

Menu → 4.Add Lists & Spreadsheet →

Enter data into 2 lists → name lists, here x and y

Calc → Regression → Linear Reg

Menu → 4.Statistics → 1.Stat Calculations

→ 4.Linear Regression (ax + b)

Set location of lists, frequency of 1 → OK

Set location of lists & result, frequency of 1 → OK

r (and r squared) are displayed

r (and r squared) are displayed, you may

have to scroll down.



EXAMPLE Find the value of Pearson’s product moment correlation coefficient correct to two significant figures for the following example and interpret the result. Determine any changes to the r value if the point (6, 20) is added to the data and comment

Solution The scatter plot, shown below, shows a strong negative relationship.

Using your graphics calculator: r = -0.99 This is interpreted using the table as a strong negative linear relationship. If the point (6, 20) is added to the data set, using your graphics calculator: r = −0.89 This is still a strong negative linear relationship, but not as strong as before, because the point (6, 20) is an outlier (plot it on the scatter plot above to see where it is!).

t days 0 2 3 4 5 8 10 12

Number of errors 20 18 16 14 12 4 2 0


QUESTION 28 The value of Pearson’s product-moment correlation coefficient between x and y in the scatter diagram shown below would be closest to: A - 0.8 B - 0.3 C 0 D 0.3 E 0.8

QUESTION 29 The correlation between marks obtained by 11 students in Mathematics and English in the scatter diagram shown is closest to: A – 0.9 B – 0.5 C 0 D 0.5 E 0.9


QUESTION 30 In a mountainous region a drainage system consists of a number of basins with rivers flowing through them. The basins are formed by the flow of the river. For a random sample of six basins, the area of each basin and the total length of the rivers flowing through each basin are as follows:

Area (km2) 7 8 9 16 12 13

River length (km) 10 8 14 20 11 15

(a) The response variable is

(b) Construct a scatter plot.

(c) Are larger areas associated with longer river lengths? Explain your answer in mathematical terms.


QUESTION 31 (a) A random sample of 10 students was asked to rate two different maths courses they

had taken on a ten-point scale. A rating of 1 means ‘absolutely dreadful’ and a rating of 10 means ‘absolutely wonderful’. The question is whether there is a relationship between the rating for each subject given by students.

Find the value of Pearson’s product moment correlation coefficient correct to three significant figures for the following example.

r = (b) Plot a scatter diagram and comment on the result. Assume that we are trying to predict

the rating for Further from the rating for Methods.

(c) Find the value of r correct to three significant figures if the point (3, 3) were changed to (1, 10).

r = (d) Comment on the result.

Methods 6 6 5 2 3 3 5 7 6 7

Further 7 9 7 4 3 8 6 9 8 10


THE COEFFICIENT OF DETERMINATION The coefficient of determination, r2, is the degree to which one variable can be predicted from another linearly related variable. The coefficient of determination is calculated by squaring Pearson’s product moment correlation coefficient, i.e. r2, and is usually expressed as a percentage.

Since 1 r 1,

Then 0 r2 1

OR

0% r2 100% An r2 value of 0.67 means that 67% of the variation in the response variable can be explained by the variation in the explanatory variable and 33% can be explained by other factors. Note: Standard Response (REPLACE BOLD TYPE WITH APPROPRIATE VALUES) The co-efficient of determination calculated to be r2, means that (r2 × 100) % of the variation in the response variable can be explained by the variation in the explanatory variable. Sometimes another sentence is added: The other (100- (r2×100)%) of the variation in the response variable can be explained by other factors or influences. Note that any version of this statement that implies a causal relationship by using words such as “is determined by”, “is due to”, “is caused by” etc. rather than “can be explained by” will be considered incorrect in the exam context!


QUESTION 32 (a) If the product moment correlation, r, for a data set is r = 0.7345, then what is the

coefficient of determination? Give your answer correct to 3 decimal places. (b) If the coefficient of determination for a set a data set is 0.6971 then what is the product

moment correlation, r, if the gradient of the line of best fit is negative? Give your answer correct to two decimal places.

(c) Scores on a test for Physics and Mathematical ability ratings are linearly related with a

correlation coefficient of 0.8356. Mathematical ability rating is the explanatory variable. Fill in the gaps in the following response.

The co-efficient of determination calculated to be _________ , means that _____% of

variation in __________ can be explained by the variation in ____________, the other

_______% of variation in _______________ can be explained by other factors or

influences.


QUESTION 33 There is a strong linear relationship (r = 0.8067) between the amount of savings a person has and their income level. From this information we can conclude that: A People with higher incomes tend to save more. B People with higher incomes tend to save less. C Increasing the amount of savings you have will increase your salary. D The amount of savings you have can be increased by increasing people’s income. E The amount of savings you have can be increased by decreasing people’s income. QUESTION 34 The relationship between weight and height in a group of students is linearly related with a correlation coefficient of 0.7772. From this information we can conclude that: A 60% of the variation of weight can be explained by the variation in height. B 78% of the variation of weight can be explained by the variation in height. C 60% of the variation of height can be explained by the variation in weight. D 78% of the variation of height can be explained by the variation in weight. E None of the variation in weight can be explained by height. QUESTION 35 The relationship between a person’s hours of coaching and their score in golf has a coefficient of determination of 0.49. The value of Pearson’s Product-Moment Correlation Coefficient for this relationship is: A 0.7 B -0.7 C 0.49 D -0.49 E 0.2401


CORRELATION AND CAUSATION The values of the correlation coefficient or the coefficient of determination only tell us that two variables tend to happen at the same time. They cannot be used to infer or prove that one variable causes the other. Sometimes it appears to be obvious that one variable does in fact cause the other, but you cannot make this assumption without further investigation. Remember: Correlation tells us about the strength and direction of the association, but NOTHING about the source or cause of either variable. Correlation is usually determined from observation of data. In order to determine cause, instead of just observing data, we would need to conduct an experiment. An experiment involves the deliberate manipulation of data so that other factors can be excluded as causal factors. Usually this involves controlling the other possible factors, so that the only variable that is altering is the one under consideration. When this is done, any observed change can be attributed to the variable that is altering. EXAMPLE The correlation between the price of a car and the number of kilometres it has travelled (odometer reading) has an r value of –0.62. Therefore there is a moderate negative relationship between price and odometer reading. Does an increase in the odometer reading cause the price of the car to fall? Although it appears to be the case and our own knowledge of the world tells us that this may well be the case, in order to establish a causal relationship we must conduct an experiment such as the simple experiment explained below. Step 1: Select two (or more) groups of cars. So that other potential variables are controlled we need to ensure that the two groups match in terms of other factors such as make, model, age, colour, maintenance, general condition etc. Step 2: The number of kilometres each car travels is controlled. For example, Group 1 could do 50 000 km per year and Group 2 could do 10 000 km per year. Step 3: Is there an observable difference in the price after a set number of years for these cars? If there is we can conclude that there is a causal relationship because the only factor that varied was the number of kilometres travelled by the cars.


Non-causal explanations for observed associations: Three possible reasons for observed associations that are not causal are common response, confounding and coincidence. Common response is where the two variables are associated, not because one causes the other, but because they are both strongly associated with a third variable. Often this third variable is causing both of the variables to change. For example, there is a strong positive association between the number of churches and the number of pubs in country towns. Does going to church cause you to drink alcohol? Or vice versa? No, this is an example of common response because both of these variables have a strong positive association with the population of the country town. It is more likely that it is the population that is causing the number of churches and pubs in a country town. Confounding is where the two variables are associated, but there are two or more possible explanatory variables and the effects of these cannot be separated from each other. For example, there is a moderate positive association between the number of study hours that you do and your examination results. Does increased hours of study cause your examination results to increase? This is an example of confounding. It is quite likely that increasing the hours of study that you do is a causal factor, but there are many other factors including the quality of study time, teaching, availability of additional resources, general ability and topics examined that also affect examination results and may also affect the number of hours study that is done. The effects of these factors are difficult to disentangle and we have no way of knowing which factor in particular causes any observed increase in examination results. Coincidence is where there is an observed association, but the association is spurious. For example, there is a strong positive association between the per capita consumption of mozzarella cheese and the number of civil engineering doctorates awarded in the United States. Does eating mozzarella cheese make for better civil engineers? This is probably an example of coincidence. The amount of mozzarella cheese consumption could be changed without any discernible effect on civil engineering courses or students. Note that coincidence is much more likely in small samples.


QUESTION 36 For each of the following situations give non-causal explanations for observed correlation. Include in your answer whether the example is likely to be common response, confounding or coincidence: (a) The correlation coefficient when investigating the number of ice creams sold and the

number of shark attacks at a seaside town indicates a strong positive relationship. (b) The correlation coefficient when investigating the life expectancy and the literacy rate of

a country indicates a moderate positive relationship. (c) The correlation coefficient when investigating the divorce rate in Maine, USA and the

per capita consumption of margarine indicates a strong positive relationship. (d) The correlation coefficient when investigating the number of runs scored in cricket and

the speed of the balls faced indicates a moderate negative relationship.

Unit 3 Further Maths - School For Excellence · Measurement, Graphs and ... presentation and interpretation of data. TYPES OF DATA ... The number of students late for a Further Maths

Documents