Copyright 2014, Simplilearn, All rights reserved. Copyright 2014, Simplilearn, All rights reserved. Lesson 1 Introduction to Analytics
Feb 16, 2016
Copyright 2014, Simplilearn, All rights reserved.
Copyright 2014, Simplilearn, All rights reserved.
Lesson 1
Introduction to Analytics
Copyright 2014, Simplilearn, All rights reserved.
● Understand what is analytics and the difference between analysis and analytics
● Know the popular tools used in analytics
● Understand the role of a data scientist
● Know the processes involved in analytics
● Define a problem statement
● Collect and summarize data
● Detect and treat outliers in the data
After completing this course, you will be able to:
Objective Slide
Copyright 2014, Simplilearn, All rights reserved.
Analytics versus Analysis
Analytics
Analytics is the science of analysis whereby statistics, data mining, computer technology, etc. is used in doing analysis
Analysis
Analysis is the process of breaking down a complex object into its simpler forms
Copyright 2014, Simplilearn, All rights reserved.
What is Analytics?
• It’s the science of wisely acquiring meaningful results from given data using various methods and technologies.
• Aims at discovering pattern of variation from the given data.
• It helps to understand the future from past data and the uncertainty related to business.
• It’s a sophisticated process that uses statistics, mathematics and economics models to predict the future and prescribe strategies.
How analytics works
Analyze data Organize data Gather data
Copyright 2014, Simplilearn, All rights reserved.
• What is the wearing rate of MRF tyres in the last 8 months?
Descriptive
• Why have the wearing rate increased in the last 8 months?
Diagnostic
• What kind of issues (like mileage) MRF tyres are most likely to face if It don’t address the issue now ?
Predictive
• On what things should MRF tyres should concentrate to reduce the overall effect ?
Prescriptive
Analytics Stages
information
insights
decision
Copyright 2014, Simplilearn, All rights reserved.
Popular Tools:
R
Revolution R
R Studio
Tableau
SAP HANA
Weka
KXEN
SAS
Copyright 2014, Simplilearn, All rights reserved.
• Inquisitive, can stare at data and spot trends.
• Come out with unrevealed stories hidden in data that helps in creating more useful insights and help solving business problems.
• Work in sync with application developer to get relevant data for analysis.
• Make an analytical plan in such a way that the results satisfy the business needs.
• Come up with an effective data mining architecture and prepare suitable models.
• Respond to and resolve data mining performance issues.
• Generate reports that are affordable from a business perspective.
Role of a Data Scientist
Copyright 2014, Simplilearn, All rights reserved.
Data Analytics Methodology
DISCOVERY
PUT INTO USE
DELIVER RESULTS
MODEL BUILDING
MODEL PLANNING
DATA PREPARING
Copyright 2014, Simplilearn, All rights reserved.
Problem Definition
WHAT IS THE
PROBLEM?
WE DON’T HAVE A
SOLUTION BECAUSE?
WE HAVE THIS
PROBLEM BECAUSE?
WHAT IS IT NOT?
Copyright 2014, Simplilearn, All rights reserved.
Techniques involved in defining a problem
• State the problem in a general way
• Understand the nature of the problem
• Survey the available literature
• Go for discussions for developing ideas
• Rephrase the research problem into a working proposition
Copyright 2014, Simplilearn, All rights reserved.
Types of Data
Qualitative Data
• Data expressed as groups or categories
• Descriptive data
• E.g. Dividing a population into high, medium and low height groups
Quantitative Data
• Data expressed as numbers
• Definitive Data
• E.g. The height of a person
● Data can be of two types – qualitative and quantitative
Copyright 2014, Simplilearn, All rights reserved.
Summarizing Data
● Summarizing is the process of converting huge amounts of raw data into a format that can be easily analyzed.
● Summaries differ based on the type of data; and can be descriptive or graphical.
010203040
Frequency of notouts
Frequencyof notouts
Batsman Frequency of not outs
Sachin 11 Sehwag 2 Dravid 36 Dhoni 32 Virat 7
Copyright 2014, Simplilearn, All rights reserved.
Summarizing Data
Numeric - Descriptive
• Mean
• Median
• Mode
Categorical - Descriptive
• Frequency distribution tables
Numeric - Graphical
• Box plot
Categorical - Graphical
• Bar charts
• Histograms
Copyright 2014, Simplilearn, All rights reserved.
Data Collection
● Process of collecting relevant data that aids in solving the problem statement
● Data Collection process needs to be defined, and systematic.
● Observations need to be recorded and organized for optimal usefulness
Collect Relevant
Data
Categorize the Data
Organize the Data
Copyright 2014, Simplilearn, All rights reserved. 15
Data Collection Methods
Observation
Experiment
Census
Questionnaire
Survey
Reporting
● Data collection methods fall broadly into two categories – primary and secondary.
● Primary methods are where the data is gathered directly through investigating, experimenting or observing various entities.
● Secondary methods refer to the methods where the data has already been gathered before the study, and is available as already published facts and reports.
Registration
Data Sources
Copyright 2014, Simplilearn, All rights reserved.
● A Data Dictionary is a file that describes the structure of the database itself.
● Includes details like –
● Number of records
● Name of each field
● Characteristic of each field
● Description of each field
● Relationships between different fields
● It helps in analyzing different data variables and their relationships between each other.
Data Dictionary
Copyright 2014, Simplilearn, All rights reserved.
Outlier Treatment
● Outlier is a point or an observation that
deviates significantly from the other observations.
● Due to experimental errors or “special circumstances”
● Outlier detection tests to check for outliers
● Outlier treatment –
● Retention
● Exclusion
● Other treatment methods
Copyright 2014, Simplilearn, All rights reserved.
● What is analytics and analysis, and what are the differences between them
● Popular tools used in analytics
● What does a data scientist do
● The processes involved in analytics life cycle
● How to formally define a problem statement
● Methods of collecting and summarizing data for analytics
● Data dictionary and its contents
● What are outliers and how to detect and treat outliers
Summary
Here is a quick recap of what we have learned in this lesson
Copyright 2014, Simplilearn, All rights reserved.
Quiz
Copyright 2014, Simplilearn, All rights reserved.
QUIZ
a.
b.
c.
d.
Gather data –Organize data –Analyze data
Analyze data – Gather data- Organize data
Organize data – Analyze data –Gather data
Gather data – Analyze data –Organize data
1 What is the sequence in which analytics is done?
Copyright 2014, Simplilearn, All rights reserved.
QUIZ
a.
b.
c.
d.
Gather data –Organize data –Analyze data
Analyze data – Gather data- Organize data
Organize data – Analyze data –Gather data
Gather data – Analyze data –Organize data
1 What is the sequence in which analytics is done?
Answer: b
Explanation: Gather data –Organize data – Analyze data
Copyright 2014, Simplilearn, All rights reserved.
QUIZ
a.
b.
c.
d.
Outliers are …………………………….?
Legitimate data objects
Unwanted data objects
None of these
2
Illegitimate data objects
Copyright 2014, Simplilearn, All rights reserved.
QUIZ
a.
b.
c.
d.
• Answer: b.
• Explanation: Legitimate data objects. Useful or not decided only after going through the question in hand.
Outliers are …………………………….?
Legitimate data objects
Unwanted data objects
None of these
Illegitimate data objects
2
Copyright 2014, Simplilearn, All rights reserved.
QUIZ
a.
b.
c.
d.
Which of the following is a way of summarizing quantitative data?
Median
Mode
All of these
Mean
3
Copyright 2014, Simplilearn, All rights reserved.
QUIZ
a.
b.
c.
d.
• Answer: d
• Explanation: Mean, median and mode are mathematical summaries of numeric or quantitative data. Frequency distribution is used to summarize categorical or qualitative data.
Which of the following is a way of summarizing Quantitative data? 3
Median
Mode
All of these
Mean
Copyright 2014, Simplilearn, All rights reserved.
QUIZ
a.
b.
c.
d.
Forecasting
Fraud detection
Estimating Missing Values
Finding Average of a Quantity
4 In which of the following case, is outlier treated as a useful recording?
Copyright 2014, Simplilearn, All rights reserved.
QUIZ
a.
b.
c.
d.
• Answer: c
• Explanation: Fraud detection. As we are looking for a recording which is quite unusual from the rest.
Forecasting
Fraud detection
Estimating Missing Values
Finding Average of a Quantity
4 In which of the following case, is outlier treated as a useful recording?
Copyright 2014, Simplilearn, All rights reserved.
QUIZ
a.
b.
c.
d.
Interval and ratio
Nominal and ordinal
Random and selective
Primary and secondary
5 What are the two categories of data collection methods?
Copyright 2014, Simplilearn, All rights reserved.
QUIZ
a.
b.
c.
d.
• Answer: a.
• Explanation: Data collection methods are classified into primary and secondary
Interval and ratio
Nominal and ordinal
Random and selective
Primary and secondary
5 What are the two categories of data collection methods?
Copyright 2014, Simplilearn, All rights reserved.
QUIZ
a.
b.
c.
d.
Insights phase
Decision phase
None of these
Information phase
6 Prescriptive Analysis falls under ………………………………. ?
Copyright 2014, Simplilearn, All rights reserved.
QUIZ
a.
b.
c.
d.
• Answer: c
• Explanation: Decision stage. Diagnostic and predictive comes under insights stage and descriptive comes under prescriptive stage.
Insights stage
Decision stage
None of these
Information stage
6 Prescriptive Analysis falls under ………………………………. ?
Copyright 2014, Simplilearn, All rights reserved.
Thank You
Copyright 2014, Simplilearn, All rights reserved.