Top Banner
Enterprise Big Data Pipeline for E-commerce/Retail Organizations Karthik Subramanya
13

Enterprise Data pipeline

Mar 19, 2017

Download

Retail

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Enterprise Data pipeline

Enterprise Big Data Pipeline for E-commerce/Retail Organizations

Karthik Subramanya

Page 2: Enterprise Data pipeline

ExtractStoreProcessCollectionSources

Various Stages of Hadoop Pipeline

Page 3: Enterprise Data pipeline

Stage 1 : Data Sources

*Sales Transaction*Customer Information & Hierarchy*Clickstream Logs ( Web, Mobile, Kiosks, etc.)*Marketing (Digital Media Campaign, Email, etc.)*Customer Feedback (Product Review, Chat , Social Media,

etc.)*Supply Chain (Inventory, Shipping, Package, etc.)

Page 4: Enterprise Data pipeline

Stage 2 : Collection

* RDBMS : Sqoop (Batch, Delta, Purge)* Logs, Events, Streams : (Flume, Kafka)* File Transfer (Distcp, Hadoop fs –put )

Considerations :Fault Tolerance, Encryption, Scalability, Declarative , Programmable, Compression

Data Velocity : Batch, Nearly Real Time, Real Time

Page 5: Enterprise Data pipeline

*Stream: (Storm, Spark, Kafka{Receiver API, Kafka Stream})

*Batch : (Map Reduce, Spark, Hive, Impala, Pig, Presto, etc. )

Stage 3 : Process

Considerations :Data at Rest, Data at MotionComputation Engine: Map Reduce, Tez, MPP, In-memory ComputationSQL based, Scripting, (Java, Python, Scala) API

Applications:Visualization, Querying, Insights, Transformation, Machine Learning (Iterative)

Page 6: Enterprise Data pipeline

*Structured (Hive (Hcatalog), Hbase(NoSql) ) *Disk Storage : (HDFS, S3)

Considerations :

*Scan Vs Random Seek*Append, Refresh Vs Update*Querying: Column Major Vs Row Major*Performance Vs Compression*Fixed Schema Vs Changing Schema*Platform Dependent Vs Platform Independent

Stage 4 : Storage

Page 7: Enterprise Data pipeline

Stage 4 : Storage Continued..Data Organization:Sharding, Partitioning, Bucketing

Data Storage :

Splittable:Text Format, Sequence Files, Writables , Avro, ORC, Paraquet

Serializable: Avro, Paraquet, Protobuf, Thrift, Etc.

Compression Formats :Lzo, Gzip, Snappy, etc.

Page 8: Enterprise Data pipeline

Some thumb Rules:

Denormalize upon every opportunity.

Changing Schema, Row Major Read Patterns, Splittable, Compressible : AvroSample Query : Select * from user a join product_sales b on (a.user_id = b.user_id)

Column Major Read Patterns, Splittable, Compressible :ParaquetSample Query : Select users where age <16

Stage 4 : Storage Continued..

Page 9: Enterprise Data pipeline

Stage 5 : Extract/Visualization

*Sqoop Export to RDBMS*Aggregate, Join, Denormalize*Export to NoSql Data Stores*ETL Operations*Build Machine Learning, Deep Learning Models*Views

IDE’s: Hue, Toad Client for Hive, Jupyter Notebooks, Apache Zeppelin

Orchestration and workflow Management Tools:Oozie, Airflow, nifi, Luigi

Page 10: Enterprise Data pipeline

Applications in Ecommerce

*Recommender Systems*User Personalization : Offers & Site Experience*Email & Digital Targeting*Sales and Inventory forecasting*Pricing*CRM*Business Intelligence* 3600 View of Customer*Search Optimization and content Enrichment*….*……..*………..

Page 11: Enterprise Data pipeline

Business Goal:Acquiring new Customers and renew existing with Best use of Digital tools.

Process:*Scan their existing invoices from competitors and offer

attractive Prices on products using Optimized learning pricing Models.*OCR Model to convert images into Product, Quantity and

Price.*Regression Models based on Customer Vertical, Size to

forecast spending across product Categories.*Optimization Algorithms to consider individual product

guidance's, Customer demand forecast, Current paid Prices to recommend new prices.

B2B Price Recommendations

Page 12: Enterprise Data pipeline

Deep learning based OCR

Pricing Server

Recommendation App

• Regression models to forecast Product Category demand for a Customer Vertical, Size

• Store Results in HBase

Optimization models toRecommend the best price

Page 13: Enterprise Data pipeline

{"tranId" : "123","tranDate" : "11/06/2016 09:42:33 PM","bid" : "4546","source" : "iPAD App","customerNumber" : "123","competitorName" : "Costco","algorithmVersion" : "v1","targetSavingsPercentage" : 0.15,"products" : [ {"custSkuNumber" : "809757","skuNumber" : "809757staples","customerPrice" : 16.99,"floorPrice" : 15.99,"ceilPrice" : 15.99,"programPrice" : 16.99,"targetPrice" : 15.99,"custQty" : 2.0,"netCost" : 9.12}, {"custSkuNumber" : "759960","skuNumber" : "759960staples","customerPrice" : 31.54,"floorPrice" : 12.97824786,"ceilPrice" : 16.0505734,"programPrice" : 34.71,"targetPrice" : 14.53698675,"custQty" : 2.0,"netCost" : 6.74}, {"custSkuNumber" : "113886","skuNumber" : "113886staples","customerPrice" : 9.24,"floorPrice" : 4.526342118,"ceilPrice" : 6.427851553,"programPrice" : 9.19,"targetPrice" : 5.256156797,"custQty" : 12.0,"netCost" : 2.47}]}

{"numItemsCheaperToCompetitor": 3,"earlierPaid": 207.94,"discountOffered": 0.15,"itemsProgramPrice": 0,"tranId": "123","tranDate": "11/06/2016 09:42:33 PM","source": "iPAD App","customerNumber": "123","competitorName": "Costco","targetSavingsPercentage": 0.15,"algorithmVersion": "v1","products": [{"custSkuNumber": "759960","skuNumber": "759960staples","customerPrice": 31.54,"floorPrice": 12.97824786,"ceilPrice": 16.0505734,"programPrice": 34.71,"targetPrice": 14.53698675,"recommendedPrice": 27.97,"custQty": 2,"netCost": 6.74},{"custSkuNumber": "809757","skuNumber": "809757staples","customerPrice": 16.99,"floorPrice": 15.99,"ceilPrice": 15.99,"programPrice": 16.99,"targetPrice": 15.99,"recommendedPrice": 15.96,"custQty": 2,"netCost": 9.12},{"custSkuNumber": "113886","skuNumber": "113886staples","customerPrice": 9.24,"floorPrice": 4.526342118,"ceilPrice": 6.427851553,"programPrice": 9.19,"targetPrice": 5.256156797,"recommendedPrice": 7.41,"custQty": 12,"netCost": 2.47}]}

INPUT

OUTPUT