PROJECT PROJECT ON ON STATISTICS STATISTICS Srikanth A Srikanth A
Jan 01, 2016
PROJECTPROJECTON ON
STATISTICS STATISTICS
Srikanth A Srikanth A
STATISTICS PROJECT REPORTSTATISTICS PROJECT REPORT
Goal The goal of doing this project was to empower ourselves and to get
familiarized with the various statistical techniques used in data analysis . Thereby helping us to do various computations on a given set of data
and to reach on various meaningful conclusions, so as to show an understanding in the basic concepts of statistics. In this project
we have made an attempt to understand how different cars in the global market produced by various different auto makers
vary from each other with respect to their engine capacity, horse power, mileage, transmission etc.
Data collectionThe database used for this project contains data of forty three cars.
Totally we considered seven attributes for the forty three cars. All the data for the attributes has been collected from the net using three portals they are,
www.automotoportal.comwww.carfolio.com
www.autocarindia.com While collecting the information regarding the cars, in order to get a diversified
data it was kept in mind to collect data of cars manufactured by different manufacturers. Hence our data set contains cars brought out by eleven major
auto manufactures through out the world. And from each manufacturer we took four sample cars. The manufacturers we considered were
BMWVOLVO
GENERAL MOTORS- CHEVEROLETMERCEDES
NISSANHONDASUZUKITOYOTAHYUNDAI
FORDLEXUS
Out of the seven attributes chosen- four was Out of the seven attributes chosen- four was quantitative attributes and 3 were qualitative quantitative attributes and 3 were qualitative
attributes.attributes. Quantitave Attributes Quantitave Attributes
Chosen.Chosen. 1. Engine Capacity (cc) 1. Engine Capacity (cc) 2. Brake Horse power 2. Brake Horse power
(BHP) (BHP) 3. Mileage (kilo meter/liter 3. Mileage (kilo meter/liter
of fuel) of fuel) 4. Top Speed 4. Top Speed
(Kilometer/hour)(Kilometer/hour)
Qualitative Attributes Qualitative Attributes Chosen .Chosen .
1.Gear Transmission 1.Gear Transmission (Automatic/ Manual/Both)(Automatic/ Manual/Both)
2. Segment 2. Segment (Sedan/SUV/MUV)(Sedan/SUV/MUV)
3. Fuel Type 3. Fuel Type
(Petrol/Diesel/Both(Petrol/Diesel/Both))
Explanation Regarding The Explanation Regarding The Attributes ChosenAttributes Chosen
Engine Cylinder Capacity Engine cylinder is the central working part of an automobile engine, the space in which
a piston travels. The capacity or the entire volume of the cylinder is given by the engine cylinder capacity.
It’s measured in terms of liters or cubic capacity (cc). Here in this data set the cylinder capacity is
expressed in terms of cc.
Brake Horse Power Is the measure of an engine's horsepower without the loss in power
caused by the gearbox and other auxiliary components. Thus the prefix "brake" refers to where the power is measured: at the engine's output shaft. The actual horsepower delivered to the driving wheels is
less.
contd:contd:
MileageMileage
Is the amount of fuel required to move the automobile over a given distance.Is the amount of fuel required to move the automobile over a given distance. The two most common ways to measure automobile fuel economy are:The two most common ways to measure automobile fuel economy are:
The amount of fuel used per unit distance; most commonly, liters per 100 kilometers The amount of fuel used per unit distance; most commonly, liters per 100 kilometers (L/(100 km)). (L/(100 km)).
Lower values mean better fuel economy: you use less fuel to travel the same distance.Lower values mean better fuel economy: you use less fuel to travel the same distance. The distance traveled per unit of fuel used;The distance traveled per unit of fuel used; most commonly, kilometers per liter (km/L) . Higher values mean better fuel economy: most commonly, kilometers per liter (km/L) . Higher values mean better fuel economy: you can travel farther for the same amount of fuel.you can travel farther for the same amount of fuel.
Here in our data set the mileage is expressed in terms of kilometer traveled perHere in our data set the mileage is expressed in terms of kilometer traveled per unit liter of the fuel used.unit liter of the fuel used.
Top SpeedTop Speed
Is the measure of the speed at which a particular vehicle can travel. It can be measured in Is the measure of the speed at which a particular vehicle can travel. It can be measured in terms of the kilometer/ Miles covered per hour by traveling at that particular speed. Her in terms of the kilometer/ Miles covered per hour by traveling at that particular speed. Her in our data set we measure the top speed at kilometers traveled per hour.our data set we measure the top speed at kilometers traveled per hour.
Gear TransmissionGear Transmission In order for the engine to transmit the power produced by it to the tyres, In order for the engine to transmit the power produced by it to the tyres,
gear transmissions provide a speed-power conversion from a higher speed gear transmissions provide a speed-power conversion from a higher speed motor to a slower but more forceful output. In vehicles the gear transmission motor to a slower but more forceful output. In vehicles the gear transmission can be done manually by the driver, Or automatically by using modern can be done manually by the driver, Or automatically by using modern electronic chip technology. Both these technologies are available n vehicles electronic chip technology. Both these technologies are available n vehicles and are made available to the customers on their request.and are made available to the customers on their request.
SegmentSegment Cars are basically classified into three depending on their usefulness They Cars are basically classified into three depending on their usefulness They
are are 1. SUV- Sports Utility Vehicles1. SUV- Sports Utility Vehicles 2. MUV- Multiple Utility Vehicles2. MUV- Multiple Utility Vehicles 3. SEDANS- Normal sized cars used basically to travel on normal terrains3. SEDANS- Normal sized cars used basically to travel on normal terrains Fuel TypeFuel Type Automobiles need fuel so as to combust it and derive power from it Automobiles need fuel so as to combust it and derive power from it
so that it can move .Automobiles uses mainly Petrol or Diesel as their fuel.so that it can move .Automobiles uses mainly Petrol or Diesel as their fuel. And each car is available in various variants depending upon the fuel type And each car is available in various variants depending upon the fuel type
used by it.used by it. For a particular type of car there might be two variants available, one which For a particular type of car there might be two variants available, one which
use petrol and other which uses diesel while some cars will be available in use petrol and other which uses diesel while some cars will be available in one form only, with either petrol or diesel one form only, with either petrol or diesel
ORGANIZED DATA
Company/ Car NameCompany/ Car Name Engine CapacityEngine Capacity Horse PowerHorse Power MileageMileage Top SpeedTop Speed TransmissionTransmission SegmentSegment Fuel TypeFuel Type
BMWBMW
1. BMW 3 Series 1. BMW 3 Series 30003000 230230 10.210.2 236236 BothBoth SEDANSEDAN PetrolPetrol
2. BMW 5 Series 2. BMW 5 Series 48004800 360360 13.113.1 250250 BothBoth SEDANSEDAN PetrolPetrol
3. BMW 7 Series 3. BMW 7 Series 60006000 438438 13.813.8 250250 AutomaticAutomatic SEDANSEDAN petrolpetrol
4. BMW X5 4.8i4. BMW X5 4.8i 48004800 355355 13.113.1 246246 AutomaticAutomatic SUVSUV PetrolPetrol
VolvoVolvo
1. Volvo V50 T51. Volvo V50 T5 25002500 218218 9.29.2 240240 AutomaticAutomatic MUVMUV PetrolPetrol
2. Volvo XC70 2. Volvo XC70 25002500 208208 11.211.2 230230 BothBoth SUVSUV BothBoth
3. Volvo S40 T5 3. Volvo S40 T5 25002500 218218 9.89.8 240240 ManualManual SEDANSEDAN petrolpetrol
4. Volvo XC904. Volvo XC90 44004400 311311 13.813.8 210210 AutomaticAutomatic SUVSUV BothBoth
ChevroletChevrolet
1. Aveo1. Aveo 14001400 9494 1010 170170 ManualManual SEDANSEDAN PetrolPetrol
2. Aveo U-VA2. Aveo U-VA 12001200 7676 1212 140140 ManualManual SEDANSEDAN PetrolPetrol
3. Tavera3. Tavera 25002500 8080 14.314.3 160160 ManualManual MUVMUV BothBoth
4. Optra4. Optra 16001600 104104 99 165165 ManualManual SEDANSEDAN BothBoth
MercedesMercedes
Mercedes Benz EMercedes Benz E 35003500 268268 12.412.4 236236 AutomaticAutomatic SEDANSEDAN DieselDiesel
Mercedes Benz SLMercedes Benz SL 50005000 302302 14.714.7 240240 AutomaticAutomatic SEDANSEDAN PetrolPetrol
Mercedes Benz EMercedes Benz E 50005000 302302 14.714.7 240240 AutomaticAutomatic WAGONWAGON PetrolPetrol
Mercedes Benz SLKMercedes Benz SLK 55005500 355355 1515 246246 AutomaticAutomatic SEDANSEDAN PetrolPetrol
NissanNissan
Nissan Xterra SE Nissan Xterra SE 40004000 261261 14.714.7 230230 AutomaticAutomatic SUVSUV PetrolPetrol
Nissan Sentra 2.0 Nissan Sentra 2.0 20002000 140140 8.48.4 140140 ManualManual SUVSUV BothBoth
Nissan Quest 3.5 Nissan Quest 3.5 35003500 235235 13.113.1 220220 AutomaticAutomatic SEDANSEDAN BothBoth
Nissan Pathfinder Nissan Pathfinder 40004000 266266 14.714.7 230230 AutomaticAutomatic MUVMUV BothBoth
HondaHonda
Honda Civic SiHonda Civic Si 20002000 197197 10.210.2 160160 ManualManual SEDANSEDAN PetrolPetrol
Honda CR-V Honda CR-V 24002400 166166 10.210.2 180180 AutomaticAutomatic SUVSUV PetrolPetrol
Honda Element LXHonda Element LX 24002400 166166 11.211.2 180180 ManualManual SUVSUV DieselDiesel
Honda Pilot EX Honda Pilot EX 35003500 244244 13.813.8 200200 AutomaticAutomatic MUVMUV PetrolPetrol
SuzukiSuzuki
Suzuki SX4 Suzuki SX4 20002000 143143 10.210.2 160160 manualmanual SUVSUV PetrolPetrol
Suzuki XL7 Suzuki XL7 36003600 252252 13.813.8 220220 automaticautomatic SUVSUV PetrolPetrol
Suzuki Aerio Suzuki Aerio 23002300 155155 9.49.4 150150 manualmanual SEDANSEDAN PetrolPetrol
Suzuki Grand Vitara Suzuki Grand Vitara 27002700 185185 12.412.4 200200 automaticautomatic SEDANSEDAN PetrolPetrol
ToyotaToyota
Toyota Highlander Toyota Highlander 33003300 215215 12.412.4 200200 AutomaticAutomatic SUVSUV PetrolPetrol
Toyota Camry Toyota Camry 24002400 158158 9.89.8 160160 ManualManual SEDANSEDAN DieselDiesel
Toyota CorollaToyota Corolla 18001800 126126 7.47.4 140140 ManualManual SEDANSEDAN PetrolPetrol
Toyota Land CruiserToyota Land Cruiser 47004700 275275 18.118.1 300300 AutomaticAutomatic SUVSUV PetrolPetrol
HyundaiHyundai
Hyundai AccentHyundai Accent 13991399 110110 7.47.4 177177 ManualManual SEDANSEDAN BothBoth
Hyundai ElantraHyundai Elantra 15991599 138138 8.48.4 182182 ManualManual SEDANSEDAN BothBoth
Hyundai SonataHyundai Sonata 23592359 234234 11.811.8 203203 ManualManual SEDANSEDAN BothBoth
Hyundai Sante FeHyundai Sante Fe 19911991 242242 12.412.4 166166 ManualManual SEDANSEDAN BothBoth
FordFord
Ford FiestaFord Fiesta 12971297 160160 10.210.2 160160 ManualManual SEDANSEDAN BothBoth
Ford MustangFord Mustang 46014601 300300 13.813.8 230230 AutomaticAutomatic SEDANSEDAN BothBoth
Ford FusionFord Fusion 22612261 160160 10.210.2 180180 ManualManual SUVSUV BothBoth
LexusLexus
Lexus IS 350Lexus IS 350 34563456 306306 11.211.2 229229 AutomaticAutomatic SEDANSEDAN PetrolPetrol
Lexus LS 430Lexus LS 430 42934293 288288 12.412.4 211211 AutomaticAutomatic SUVSUV PetrolPetrol
Lexus ES 330Lexus ES 330 33143314 218218 11.211.2 230230 AutomaticAutomatic SEDANSEDAN PetrolPetrol
Lexus SC 430Lexus SC 430 42934293 288288 12.412.4 250250 AutomaticAutomatic SEDANSEDAN PetrolPetrol
DATA ANALYSIS
*Classes in 100's*Classes in 100's MidpointMidpoint FrequencyFrequency Cumulative freqCumulative freq
<1200<1200 600600 00 00
1200-18001200-1800 15001500 44 44
1800-24001800-2400 21002100 77 1111
2400-30002400-3000 27002700 1111 2222
3000-36003000-3600 33003300 22 2424
3600-42003600-4200 39003900 66 3030
4200-48004200-4800 45004500 55 3535
4800-54004800-5400 51005100 66 4141
5400-60005400-6000 57005700 11 4242
6000-66006000-6600 63006300 11 4343
4343
The above given table represents the frequency distribution of Engine Capacity measured in cubic capacity. Here the classes are chosen with class width of 600 units. With the first class starting from 0 to 1200 and going up to 6600 units The frequency distributions of the cars are done in respect to the above taken classes.
Frequency Distribution of engine capacity
Measures of Central tendencies
MeanMean 3383.7213383.721
MedianMedian 2972.7272972.727
ModeMode 2584.6152584.615
Standard DeviationStandard Deviation 859.2587859.2587
Mean = Σfx/Σf, where f is the frequency and x is the midpoint of the class intervals.
N/2 - F median = L + I *
f
where:L = lower limit of the interval containing the medianI = width of the interval containing the medianN = total number of respondentsF = cumulative frequency corresponding to the lower limitf = number of cases in the interval containing the medianMode = Lmo +(d1/(d1+d2))*w Where:Lmo Lower limit of the modal classd1 frequency of the modal class minus the frequency of the class directly below itd2 frequency of the modal class minus the frequency of the class directly above itw width of the modal class interval
Histogram
HISTOGRAM
0
4
7
11
2
6
5
6
1 1
0
2
4
6
8
10
12
<1200
1200-1
800
1800-2
400
2400-3
000
3000-3
600
3600-4
200
4200-4
800
4800-5
400
5400-6
000
6000-6
600
*Engine capacity
No
. o
f cars
From the histogram we can infer that the maximum number of cars in the data collected belong to the 4th class i.e. with an engine capacity ranging between 2400 cc to 3000cc
FREQUENCY POLYGON
0
4
7
11
2
6
5
6
1 1
0
2
4
6
8
10
12
<12
00
1200
-180
0
1800
-240
0
2400
-300
0
3000
-360
0
3600
-420
0
4200
-480
0
4800
-540
0
5400
-600
0
6000
-660
0
*Engine capacity
No
. o
f ca
rs
The frequency polygon constructed helps us to sketch the distribution of the engine capacities of the cars much more clearly.
OGIVE CURVE
04
11
22 24
3035
41 42 43
05
101520253035404550
<120
0
1200
-180
0
1800
-240
0
2400
-300
0
3000
-360
0
3600
-420
0
4200
-480
0
4800
-540
0
5400
-600
0
6000
-660
0
*Engine Capacity
Cum
mul
ativ
e Fr
eque
ncy
The ogive shown is constructed using the cumulative frequency. Here we are showing a less than ogive curve .If we take a point on the curve and connect it to the x- axis and then to the corresponding point on the y- axis. It helps us to infer the total number of cars that would lie below the corresponding class of engine capacity given in the x-axis.
Representation Of Frequency Distribution Of Qualitative Data
Qualitative data if it has to be represented graphically, doing it on a pie- chart is the best way to do it. As this kind of representation clearly gives the reader an idea about what percentage of the data under study belongs to which category. Here in our data set we have taken totally three attributes which are qualitative. Out of which we have chosen the Fuel Type to be representedgraphically.
Fuel TypeFuel Type FrequencyFrequency
PetrolPetrol 2626
DieselDiesel 33
BothBoth 1414
Distribution of Fuel Used By The Cars In The Data Set
60%
7%
33%
Petrol
Diesel
Both
Probability Distribution of Transmission with respect to the Horse power
Class of Horse powerClass of Horse power AutomaticAutomatic ManualManual BothBoth totaltotal
0 - 500 - 50 00 00 00 00
50 - 10050 - 100 00 33 00 33
100 - 150100 - 150 00 66 00 66
150 - 200150 - 200 22 66 00 88
200 - 250200 - 250 55 33 22 1010
250 - 300250 - 300 77 00 00 77
300 - 350300 - 350 55 00 00 55
350 - 400350 - 400 22 00 11 33
400 - 450400 - 450 11 00 00 11
450 - 500450 - 500 00 00 00 00
totaltotal 2222 1818 33 4343
The above given table contains a distribution of the Horse power in respect to the Transmission systems used in cars. Since the cars in the data set have automatic/manual/ or both type of variants available for a single type of car. It’s represented as mentioned. With the help of the table we are trying to find the probability occurrence in various ways.
Find the probability that the selected car has an automatic gear system?
Total number of cars with automatic gear system is =22Total number of cars =43
Therefore, probability that a selected car has a gear system in it is =0.511627907
So there is a 51.16 % chance that the selected car has an automatic gear system in it. Find the probality that a selected car with a manual gear system
has a horse power of 175 bhp.
Total number of cars with manual gear system = 18Cars falling in the class with horse power of 175 bhp = 6
Hence probability that a selected car with a manual gear has a horse power Of 175 = 0.333333333
33.33% chances are there that a selected car would have a manual gear system with 175 bhp.
Binomial DistributionBinomial Distribution
Success defined as picking a car which has mileage above 13 km/l. Success defined as picking a car which has mileage above 13 km/l.
From the data set we can find the values of the following.From the data set we can find the values of the following. Success event: p = Success event: p = 0.3488372090.348837209 Failure event: q = Failure event: q = 0.6511627910.651162791 Probability of picking up 6 cars with mileage more than 13 kmpl Probability of picking up 6 cars with mileage more than 13 kmpl
in 10 trails from the data set.in 10 trails from the data set. No of trials: n = 10No of trials: n = 10 Random variable x = 6Random variable x = 6 Probability of (X = x) = Probability of (X = x) = nCx * px * q (n-x nCx * px * q (n-x ) ) Therefore, P(X=6) = Therefore, P(X=6) = 0.0680321850.068032185 We can say that 6.8% of the time the selected random We can say that 6.8% of the time the selected random
experiment is true.experiment is true.
Normal DistributionNormal Distribution
Probability that a randomly selected car from the data Probability that a randomly selected car from the data set will have a top speed less than 220 set will have a top speed less than 220
Mean of Top speed =204.3488Mean of Top speed =204.3488 Standard Deviation =38.7039Standard Deviation =38.7039
x=220x=220 μ =204.3488μ =204.3488 σ =38.7039σ =38.7039 P (x <= 220) = P (x <= 220) = 0.65700.6570 65.70 % of the times a randomly selected car from the 65.70 % of the times a randomly selected car from the
data will have a top speed less than 220.data will have a top speed less than 220.
APLICATION OF CORRELATION
To apply the concept of correlation in the given data set we have decided to correlate engine capacity and horse power. By the help of scatter diagram we were able to find the degree of correlation between the two attributes graphically.
Correlation between Horse power and Engine Capacity
y = 13.927x + 16.285
R2 = 0.8377
0
1000
2000
3000
4000
5000
6000
7000
0 50 100 150 200 250 300 350 400 450 500
Horse power
Engi
ne c
apac
ity
From the graph it is observable that there is a high degree of positive correlation between the two attributes.
The correlation coefficient was found out to be 0.91526 The calculated correlation coefficient shows that there is a high level of
positive correlation between the two attributes. Which means that as the engine capacity increases the horse power also
increases. This conclusion led us to apply the concept of regression in the current aspect. As a result of which we were able to get the regression equation-
Y=13.927X + 16.285 Here Y represents engine capacity and X represents the horse power. Now using this equation we can predict what the engine capacity will be for
a given value of horse power. Eg:- What will the engine capacity be for a car with an horse power of 600
BHP Y=13.927X+16.285 Here X=600 Therefore Y= 13.927*600+ 16.285 Hence the engine capacity=Y=8372.485 cc In turn the coefficient of determination was found to be R2 =0.8377