Top Banner
BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America
28

BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Dec 14, 2015

Download

Documents

Virginia Bement
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

BIA 660 Web Analytics - Midterm

Akshta ChouguleHao HanDi HuoXi LuLaura Sills

Bank Of America

Page 2: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Business Problem

Customer Strategy: grow base by forming life-long banking relationships with young adults

Current Account Demographics Report Shows● fewer new student accounts● increase in cancellation of accounts by

the young adult demographic

Impact: Losing market share to other banks

Page 3: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Business Questions

● What is Bank of America’s reputation with this age group - do they like Bank of America or not?

● How does Bank of America compare to other banks?

● Are customers in this demographic group unhappy with the bank’s services?

● Are there any banking products which customers in this group want not offered by Bank of America?

Page 4: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Source of Information

Online social media sites are a good source for comments from this age group

Page 5: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

YouTube Statistics

●More than 1 billion unique users monthly

● Nielsen ratings show that YouTube reaches more US adults ages 18-34 than any other cable network

http://www.youtube.com/yt/press/statistics.html

Page 6: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Demographics of Reddit

http://www.theatlantic.com/technology/archive/2013/07/reddit-demographics-in-one-chart/277513/

Page 7: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

What do People Think About Banks?

Page 8: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Topic Reddit YouTube Twitter

mortgage 5% 6% 30%

loan 5% 13% 0%

fraud 6% 7% 0%

insurance 1% 2% 0%

branch 3% 1% 0%

hours 2% 1% 0%

account 19% 16% 20%

overdraft 8% 1% 0%

bailout 1% 6% 0%

fee 18% 11% 20%

customer 13% 8% 0%

representative / teller 7% 18% 20%

[credit] union 10% 7% 10%

computer 1% 1% 0%

CEO 2% 2% 0%

Page 9: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Data Gathering and Validation

Use Python to obtain comments from web

● Crawling Reddit

● API for Twitter

● API for YouTube

Page 10: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Data Cleansing and Exploration

● Delete incomplete comments, extra whitespace, and punctuation, stopwords

● Explore data using Python to analyze the frequency of words in the comments in order to identify “key words” related to banking

● Word scan confirmed the key words

Page 11: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Gathering data from Twitter ● Technique: twitter API● Amount of tweets:

BOA -- 125KB

Citibank-- 104 KB

Chase -- 100 KB

● Timestamp: 1 week ● Type of Data:

Tweet text

Tweet created_at

Geocode

Page 12: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Data Processing

● Two libraries: positive & negative

● Score each tweet

Page 13: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Tweets by Location

Page 14: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Data Processing

● Summary for BOA tweets:

● Good or bad?

Min. 1st Qu. Median Mean 3rd Qu. Max.

-0.20000 -0.04348 0.00000 -0.01176 0.02857 0.20000

Page 15: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Competitor Analysis

Page 16: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Distribution for tweets’ score

Mean:

BOA: -0.01176

Citi bank: -0.0006146

Chase: -0.00731

Page 17: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Two Sample T-test

Null hypothesis: true difference in means is equal to 0Alpha=0.1

● BOA and Citi bank:

p-value = 0.0009004 < 0.1● Citi bank and Chase:

p-value = 0.06971 < 0.1● BOA and Chase

p-value = 0.2289 > 0.1

Page 18: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Gathering data from YouTube

● Techniques: BeautifulSoup

g.data

● Amount for general analysis: 3097

Page 19: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Topic Reddit YouTube Twitter

mortgage 5% 6% 30%

loan 5% 13% 0%

fraud 6% 7% 0%

insurance 1% 2% 0%

branch 3% 1% 0%

hours 2% 1% 0%

account 19% 16% 20%

overdraft 8% 1% 0%

bailout 1% 6% 0%

fee 18% 11% 20%

customer 13% 8% 0%

representative / teller 7% 18% 20%

[credit] union 10% 7% 10%

computer 1% 1% 0%

CEO 2% 2% 0%

Page 20: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

YouTube data for each category

● Training data: 600

● Loan: 2430

● Account: 2700

● Service: 520

Page 21: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Naive Bayes Classification Algorithm

A naive Bayes classifier assumes that the presence or absence of a particular feature is unrelated to the presence or absence of any other feature, given the class variable。

Page 22: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Naive Bayes Classification Algorithm

Splitting the dataset into training and test data

(Manual rating of comments)

● Training (400)

● Testing (200)

● Predicting (5700)

Page 23: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Primary Categories of Customer Complaints

Page 24: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Accuracy of Classification

● Mortgage: 64.5%

● Accounts: 58.7%

● Service: 68.4%

Page 25: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Mortgage

Page 26: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Account

Page 27: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Service

Page 28: BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America.

Thank you!