Team #6 Bill Cheng Sabina Del RossoStephen Hom Omede Firouz Stacy Hsueh Wei Jiang Thoranis Karnasuta Social Networking Analytics for Calbee (SNAC) CLIENTEER/SCHEMANORMALIZATIONQUERIES.

Post on 26-Dec-2015

216 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

Transcript

Team #6 Bill Cheng Sabina Del Rosso Stephen HomOmede Firouz Stacy Hsueh Wei Jiang Thoranis Karnasuta

Social Networking Analytics for Calbee

(SNAC)

CLIENT EER/SCHEMA NORMALIZATION QUERIES

Professor Ken Goldberg. IEOR 115. December 9, 2011.

DATABASE

Client Background: Calbee San Francisco

• CALBEE, Inc. is one of the largest snack companies in Japan– Company based on the premise of good health

• Calbee, San Francisco is the company’s first US-based flagship store– Founded in early 2011– Located in Westfield Mall

• Active in social media– Website, Facebook, Twitter

Image from calbeeshop.com

CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE

• Currently do not keep track of social media hits on any site

• Use Point of Sale for sales data and employee clock-ins

Current Infrastructure

CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE

Image from http://www.unrealstudio.com

• Handle future expansion into e-commerce• Increase social media marketing in targeted

demographics• View effect of promotions on sales and social

media to help better cater future promotions• Provide a foundation to maximize profits– Logistic management using integer programming– Data mining and machine learning to predict sales

Database Objectives

CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE

EER Diagram

EER/SCHEMA NORMALIZATION QUERIES DATABASECLIENT

Relational Design Schema (46 relations)Promotion/Sales/Retail: Relations Numbered 0-91. PRODUCT(ProdID, Name, IsSour, IsSweet, IsSalty, IsSavory, ManufCost, RetailPrice)2. PURCHASE(PurchaseID, ProdID1, PromoID3a, CustID6a, StoreID4a, EmpID5a, Timestamp, ipAddress)3a. PROMOTION(PromoID, PromoCode, StoreID, StartDate, EndDate, Discount)3b. PROMOTION_SPREAD_VIA_TWITTER(PromoID3a, TweetID10c)3c. PROMOTION_SPREAD_VIA_F(PromoID3a, F_CID11c)3d. PROMOTION_SPREAD_VIA_G+(PromoID3a, G_CID12c)3e. PROMOTION_SPREAD_VIA_S(PromoID3a, S_DID13c)3f. PROMOTION_SPREAD_VIA_B(PromoID3a, BPost_ID14a)3g. PROMOTION_INFO_VIA_W(PromoID3a, url15)4a. STORE(StoreID, AddressNo,StreetName, City, Country, ZipCode, PhoneNo)4b. STORE_CARRIES(StoreID4a, ProdID1, Stock)5a. EMPLOYEE(EmpID, LName, FName, Position, FavProdID1, StoreID4, AddressNo,StreetName, City, State, Country, ZipCode, SSN)5b. EMPLOYEE_IS_FRIEND(EmpID5a, T_UID10a, F_UID11a, G_UID12a, S_UID13a)5c. EMPLOYEE_IS_CUSTOMER(EmpID5a, CustID6a)6a. CUSTOMER(CustID, LName, FName, AddressNo, StreetName, City, State, Country, ZipCode, FavProd1, BirthDate)6b. CUSTOMER_IS_FRIEND(CustID6a, T_UID10a, F_UID11a, G_UID12a, S_UID13a)8a. PRODUCT_AD(P_Ad_ID, ProductID1, DateBeginAd, DateEndAd, F_or_G_Ad)8b. STORE_AD(S_Ad_ID, Store_ID, DateBeginAd, DateEndAd, F_or_G_Ad)8c. F_P_AD_CLICKED(P_Ad, F_UID, Timestamp, ipAddress)8d. G_P_AD_CLICKED(P_Ad_ID, G_UID, Timestamp, ipAddress)8e. F_S_AD_CLICKED(S_Ad_ID, F_UID, Timestamp, ipAddress)8f. G_S_AD_CLICKED(S_Ad_ID, G_UID, Timestamp, ipAddress)

Relational Design Schema

EER/SCHEMA NORMALIZATION QUERIES DATABASECLIENT

Social Media: Relations Numbered 10-1910a. T_USER(T_UID, T_Username, Fname, Lname, City, State, BirthDate, Email)10b. T_FOLLOWING(T_UID10a, Follower_T_UID10a, DateBeganFollowing)10c. TWEET(TweetID, T_UID10a, Auth_T_UID10a, TextStr, Timestamp)11a. F_USER(F_UID, Fname, Lname, City, State, BirthDate, Email)11b. F_FRIENDS(F_UID11a, Friend_F_UID11a, DateBecameFriends)11c. F_COMMENT(F_CID, Auth_F_UID11a, On_F_CID11c, TextStr, Timestamp)11d. F_LIKE(F_CID11c, F_UID11a, Timestamp)12a. G_USER(G_UID, Fname, Lname, City, State, BirthDate, Email)12b. G_FRIENDS(G_UID12a, Friend_G_UID12a, DateBecameFriends)12c. G_COMMENT(G_CID, Auth_G_UID12a, On_G_CID12c, TextStr, Timestamp)12d. G_LIKE(G_CID12c, G_UID12a, Timestamp)13a. S_USER(S_UID, Fname, Lname, City, State, BirthDate, Email)13b. S_FOLLOWING(S_UID13a, Follower_S_UID13a, DateBeganFollowing)13c. S_DISCOVERY(S_DID, S_UID13a, url, Timestamp)13d. S_REVIEW(S_DID13c, S_UID13a, TextStr, Like/Dislike, Timestamp)14a. BLOG_POST(url, BPost_ID, Author_Emp_ID5a, TextStr, Timestamp)14b. BLOG_COMMENT(BComment_ID, url, BPost_ID14a, TextStr, Timestamp, ipAddress)14c. ASSOCIATE_IP_T(T_UID10a, Timestamp, ipAddress)14d. ASSOCIATE_IP_FB(F_UID11a, Timestamp, ipAddress)14e. ASSOCIATE_IP_G(G_UID12a, Timestamp, ipAddress)14f. ASSOCIATE_IP_S(S_UID13a, Timestamp, ipAddress)15. MAIN_WEBSITE(url, link_to_html_file, Timestamp)

Relational Design Schema Cont.

EER/SCHEMA NORMALIZATION QUERIES DATABASECLIENT

Other Data: Relations Numbered 20-2920a. GOOGLE_TREND(GT_ID, word, city, country, day, hits)20b. RELATED_TREND(word, Related_Prod_ID1)

Relational Design Schema Cont.

EER/SCHEMA NORMALIZATION QUERIES DATABASECLIENT

Access Table Relationships

EER/SCHEMA NORMALIZATION QUERIES DATABASECLIENT

EER/SCHEMA NORMALIZATION QUERIES DATABASECLIENT

Access Table Relationships Cont.

Normalization Analysis: 1NF

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

Removal of a multi-valued attribute (flavor):

Normalization Analysis: 2NF

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

Removal of a partial FD:{PromoID} {PromoCode, StoreID, StartDate, EndDate, Discount}

Normalization Analysis: 3NF

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

Removal of a transitive FD:{T_UID} {T_Username, Fname, Lname, City, State, BirthDate, Email}

Normalization Analysis: BCNF

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

Removal of a FD with a non-superkey attribute on the LHS:{PromoCode} {StartDate, EndDate, Discount}

Find out the most talked about products in a city and their quantities (stock). This will help us determine which products to move around to balance inventories in expectation of sale increases. Data can be exported to a solver to do a shipment problem.

Query 1: Popular Product Stock

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

SELECT Product.ProdID, Product.ProdName, (SELECT COUNT(F_Comment.F_CID) FROM F_Comment WHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*') AS Hits, Store.City, Store_Carries.StoreID AS Store, Store_Carries.Stock AS StockFROM Product, Store, Store_CarriesWHERE (((Product.ProdID)=[Store_Carries].[ProdID]) AND ((Store.StoreID)=[Store_Carries].[StoreID]))ORDER BY Product.ProdName;

Query 1: SQL

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

Query 1: Output

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

• We have a list of stores and their stock of different products– Transportation problem to encourage similar

levels of stock– Minimize shipments, shipping costs, etc.• Subject to: No outliers (stores with low stock)

Possible shipment constraintsPossible traffic constraintsEtc.

Query 1: Data Analysis

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

Query 1: Data Analysis (AMPL)

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

Query 1: Data Analysis

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

Consider a promotion. Compare product social network comments in a given city two weeks before, during, and two week after a promotion to judge its effectiveness.

Order by the return on the investment.

Query 2: Promo Social Networking

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

SELECT Promotion.PromoID, (SELECT COUNT(*)FROM F_Comment, ProductWHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*'AND Product.ProdID = Promotion.ProdIDAND F_Comment.Timestamp < Promotion.StartDateAND F_Comment.Timestamp > Promotion.StartDate - 14) AS HitsBefore, (SELECT COUNT(*)FROM F_Comment, ProductWHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*'AND Product.ProdID = Promotion.ProdIDAND F_Comment.Timestamp < Promotion.EndDateAND F_Comment.Timestamp > Promotion.StartDate) AS HitsDuring, (SELECT COUNT(*)FROM F_Comment, ProductWHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*'AND Product.ProdID = Promotion.ProdIDAND F_Comment.Timestamp < Promotion.EndDate + 14AND F_Comment.Timestamp > Promotion.EndDate) AS HitsAfter, (SELECT SUM(Promotion.Discount*Product.RetailPrice)FROM Product) AS PromoCostFROM PromotionORDER BY PromoCost;

Query 2: SQL

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

Use friendship data to rate friends by how many recommendations they have made.

Determine how many of a person's friends became friends with us after they became friends with us.

In this way, we identify possible priority customers of Calbee to target for special advertisements and promotions.

Query 3: Priority Customers

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

SELECT F.F_UID, ( SELECT COUNT(*)FROM F_Friends AS F2WHERE F2.F_UID = F.F_UID AND EXISTS( SELECT F3.DateBecameFriendsFROM F_FRIENDS F3WHERE F3.Friend_F_UID = 1AND F3.F_UID = F2.Friend_F_UIDAND F3.DateBecameFriends > F.DateBecameFriends)) AS friendCountFROM F_Friends AS FWHERE F.Friend_F_UID = 1;

Query 3: SQL

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

Query 3: Data Analysis

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

Determine priority stores which don’t stock products that they should, as determined by google trend word popularity.

For a given google trend word, find the top 5 cities in which the word is most searched in year 2011. Then, find stores in those cities and which related products they do not stock. This will help us identify how to improve inventory.

Query 4: Google Trends and Stocks

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

SELECT Store.StoreID AS Store, Store.City AS City, Product.ProdID AS ProdFROM Store, Store_Carries, ProductWHERE Store.City IN (SELECT TOP 5 Google_Trend.CityFROM Google_TrendWHERE Google_Trend.Word = 'test')AND Store_Carries.StoreID = Store.StoreIDAND Store_Carries.ProdID = Product.ProdIDAND Store_Carries.Stock = 0AND Product.ProdID IN (SELECT Related_Trend.Related_Prod_IDFROM Related_TrendWHERE Related_Trend.Word = 'test');

Query 4: SQL

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

Gather Social Networking, Google Trend, and Purchase data over time to formulate predictive models.For a given product, find the number of social network hits of a product, the related trend word hits, and the number of purchases in that product for a given city on a given day. In this way, we can use social network 'buzz' and trend data to predict purchases as a function of time and city. Order by product then timestamp.

Query 5: Social Network and Purchases

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

SELECT Product.ProdID, Product.ProdName, Purchase.Timestamp, (SELECT COUNT(F_Comment.F_CID) FROM F_Comment WHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*'AND F_Comment.Timestamp = Purchase.Timestamp) AS SocialNetworkHits, (SELECT SUM(Google_Trend.hits)FROM Google_TrendWHERE Google_Trend.word = Product.ProdNameAND Google_Trend.Timestamp = Purchase.Timestamp) AS TrendHitsFROM Product, PurchaseWHERE Purchase.ProdID = Product.ProdIDORDER BY Purchase.ProdID, Timestamp;

Query 5: SQL

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

Query 5: Output

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

Query 5: Data Analysis

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

• Social media networking, Google Trends, and Purchases data used predictively– Group into weekly vectors

– Extract significant data using Principle Component Analysis to project onto 2 dimensions.

– Cluster data using K-Means

See if we can predict future sales using machine learning

Query 5: Data Analysis

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

Query 5: Data Analysis

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

Employees login here:

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

Login Interface

Allows employees to insert data in forms or run selected query

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

Switchboard

Enter information on new Calbee employees

Forms: New Employee

NORMALIZATION QUERIES DATABASECLIENT EER/SCHEMA

Questions?

CLIENT EER/SCHEMA NORMALIZATION QUERIES DATABASE

Thank you!

top related