Top Banner
Copyright 2014, Simplilearn, All rights reserved. Copyright 2014, Simplilearn, All rights reserved. Lesson 1 Introduction to Analytics
32

001 E-book - Introduction to Analytics

Feb 16, 2016

Download

Documents

Amruth Charan K

introduction to analysis
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

Copyright 2014, Simplilearn, All rights reserved.

Lesson 1

Introduction to Analytics

Page 2: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

● Understand what is analytics and the difference between analysis and analytics

● Know the popular tools used in analytics

● Understand the role of a data scientist

● Know the processes involved in analytics

● Define a problem statement

● Collect and summarize data

● Detect and treat outliers in the data

After completing this course, you will be able to:

Objective Slide

Page 3: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

Analytics versus Analysis

Analytics

Analytics is the science of analysis whereby statistics, data mining, computer technology, etc. is used in doing analysis

Analysis

Analysis is the process of breaking down a complex object into its simpler forms

Page 4: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

What is Analytics?

• It’s the science of wisely acquiring meaningful results from given data using various methods and technologies.

• Aims at discovering pattern of variation from the given data.

• It helps to understand the future from past data and the uncertainty related to business.

• It’s a sophisticated process that uses statistics, mathematics and economics models to predict the future and prescribe strategies.

How analytics works

Analyze data Organize data Gather data

Page 5: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

• What is the wearing rate of MRF tyres in the last 8 months?

Descriptive

• Why have the wearing rate increased in the last 8 months?

Diagnostic

• What kind of issues (like mileage) MRF tyres are most likely to face if It don’t address the issue now ?

Predictive

• On what things should MRF tyres should concentrate to reduce the overall effect ?

Prescriptive

Analytics Stages

information

insights

decision

Page 6: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

Popular Tools:

R

Revolution R

R Studio

Tableau

SAP HANA

Weka

KXEN

SAS

Page 7: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

• Inquisitive, can stare at data and spot trends.

• Come out with unrevealed stories hidden in data that helps in creating more useful insights and help solving business problems.

• Work in sync with application developer to get relevant data for analysis.

• Make an analytical plan in such a way that the results satisfy the business needs.

• Come up with an effective data mining architecture and prepare suitable models.

• Respond to and resolve data mining performance issues.

• Generate reports that are affordable from a business perspective.

Role of a Data Scientist

Page 8: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

Data Analytics Methodology

DISCOVERY

PUT INTO USE

DELIVER RESULTS

MODEL BUILDING

MODEL PLANNING

DATA PREPARING

Page 9: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

Problem Definition

WHAT IS THE

PROBLEM?

WE DON’T HAVE A

SOLUTION BECAUSE?

WE HAVE THIS

PROBLEM BECAUSE?

WHAT IS IT NOT?

Page 10: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

Techniques involved in defining a problem

• State the problem in a general way

• Understand the nature of the problem

• Survey the available literature

• Go for discussions for developing ideas

• Rephrase the research problem into a working proposition

Page 11: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

Types of Data

Qualitative Data

• Data expressed as groups or categories

• Descriptive data

• E.g. Dividing a population into high, medium and low height groups

Quantitative Data

• Data expressed as numbers

• Definitive Data

• E.g. The height of a person

● Data can be of two types – qualitative and quantitative

Page 12: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

Summarizing Data

● Summarizing is the process of converting huge amounts of raw data into a format that can be easily analyzed.

● Summaries differ based on the type of data; and can be descriptive or graphical.

010203040

Frequency of notouts

Frequencyof notouts

Batsman Frequency of not outs

Sachin 11 Sehwag 2 Dravid 36 Dhoni 32 Virat 7

Page 13: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

Summarizing Data

Numeric - Descriptive

• Mean

• Median

• Mode

Categorical - Descriptive

• Frequency distribution tables

Numeric - Graphical

• Box plot

Categorical - Graphical

• Bar charts

• Histograms

Page 14: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

Data Collection

● Process of collecting relevant data that aids in solving the problem statement

● Data Collection process needs to be defined, and systematic.

● Observations need to be recorded and organized for optimal usefulness

Collect Relevant

Data

Categorize the Data

Organize the Data

Page 15: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved. 15

Data Collection Methods

Observation

Experiment

Census

Questionnaire

Survey

Reporting

● Data collection methods fall broadly into two categories – primary and secondary.

● Primary methods are where the data is gathered directly through investigating, experimenting or observing various entities.

● Secondary methods refer to the methods where the data has already been gathered before the study, and is available as already published facts and reports.

Registration

Data Sources

Page 16: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

● A Data Dictionary is a file that describes the structure of the database itself.

● Includes details like –

● Number of records

● Name of each field

● Characteristic of each field

● Description of each field

● Relationships between different fields

● It helps in analyzing different data variables and their relationships between each other.

Data Dictionary

Page 17: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

Outlier Treatment

● Outlier is a point or an observation that

deviates significantly from the other observations.

● Due to experimental errors or “special circumstances”

● Outlier detection tests to check for outliers

● Outlier treatment –

● Retention

● Exclusion

● Other treatment methods

Page 18: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

● What is analytics and analysis, and what are the differences between them

● Popular tools used in analytics

● What does a data scientist do

● The processes involved in analytics life cycle

● How to formally define a problem statement

● Methods of collecting and summarizing data for analytics

● Data dictionary and its contents

● What are outliers and how to detect and treat outliers

Summary

Here is a quick recap of what we have learned in this lesson

Page 19: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

Quiz

Page 20: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Gather data –Organize data –Analyze data

Analyze data – Gather data- Organize data

Organize data – Analyze data –Gather data

Gather data – Analyze data –Organize data

1 What is the sequence in which analytics is done?

Page 21: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Gather data –Organize data –Analyze data

Analyze data – Gather data- Organize data

Organize data – Analyze data –Gather data

Gather data – Analyze data –Organize data

1 What is the sequence in which analytics is done?

Answer: b

Explanation: Gather data –Organize data – Analyze data

Page 22: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Outliers are …………………………….?

Legitimate data objects

Unwanted data objects

None of these

2

Illegitimate data objects

Page 23: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

• Answer: b.

• Explanation: Legitimate data objects. Useful or not decided only after going through the question in hand.

Outliers are …………………………….?

Legitimate data objects

Unwanted data objects

None of these

Illegitimate data objects

2

Page 24: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Which of the following is a way of summarizing quantitative data?

Median

Mode

All of these

Mean

3

Page 25: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

• Answer: d

• Explanation: Mean, median and mode are mathematical summaries of numeric or quantitative data. Frequency distribution is used to summarize categorical or qualitative data.

Which of the following is a way of summarizing Quantitative data? 3

Median

Mode

All of these

Mean

Page 26: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Forecasting

Fraud detection

Estimating Missing Values

Finding Average of a Quantity

4 In which of the following case, is outlier treated as a useful recording?

Page 27: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

• Answer: c

• Explanation: Fraud detection. As we are looking for a recording which is quite unusual from the rest.

Forecasting

Fraud detection

Estimating Missing Values

Finding Average of a Quantity

4 In which of the following case, is outlier treated as a useful recording?

Page 28: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Interval and ratio

Nominal and ordinal

Random and selective

Primary and secondary

5 What are the two categories of data collection methods?

Page 29: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

• Answer: a.

• Explanation: Data collection methods are classified into primary and secondary

Interval and ratio

Nominal and ordinal

Random and selective

Primary and secondary

5 What are the two categories of data collection methods?

Page 30: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

Insights phase

Decision phase

None of these

Information phase

6 Prescriptive Analysis falls under ………………………………. ?

Page 31: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

a.

b.

c.

d.

• Answer: c

• Explanation: Decision stage. Diagnostic and predictive comes under insights stage and descriptive comes under prescriptive stage.

Insights stage

Decision stage

None of these

Information stage

6 Prescriptive Analysis falls under ………………………………. ?

Page 32: 001 E-book - Introduction to Analytics

Copyright 2014, Simplilearn, All rights reserved.

Thank You

Copyright 2014, Simplilearn, All rights reserved.