Top Banner

Click here to load reader

Bi with apache hadoop(en)

Jun 14, 2015



Simple, low level presentation just to get the audience smooth into hadoop and show them real use cases

  • 1. Business Integration with CDH 4 (including Apache Hadoop) Alexander Alten-Lorenz, Cloudera INC Muenchen, 22. February 2013

2. ChallengesVolumeVelocity Variety 3. Business Integration CRM Invoicing Analytics Risk Management Social Networks Universal Data Access Marketing Data Governance Document Store SAP / Salesforce Search-Indices Article and StorageManagement 4. Use Cases 5. Risk Management Problem: Scoring of Customers andProjects Solution: Finance History, Communicationand Pattern Detection User: Finance, Insurance 6. Recommendations Problem: Recommend convenient productsto purchased products, matching theinterests Solution: Statistical analysis of interests,purchase history, detect matching swarmpatterns Users: eCommerce, Advertising 7. Graph-Analytics Problem: Detect trends and curves in largedistributed networks (Wired, Social, Mesh) Solution: Collecting and Data Mining alldata, applying to self learning patterns todetect trends and forecasts User: Enterprises, Gov, NGO, Provider,Telco, Stock Exchange 8. Detection of Dangerous Use Problem: Spam, Credit Card Abuse Solution: Pattern Detection, Prioritizing,heuristically Analytics Users: Retail, Finance, Reseller 9. Text Analysis Problem: Detect the meaning of the writtenword (Sentiment Analysis) Solution: Keyword patterns, Coherencesdetection, Path detection Users: eCommerce, Social Media ServiceProvider, Attitude Research 10. Amounts of real Data Ebay: 12 PB, Search Optimization Facebook: 50 PB, Logs, Reports Walmart, 4.5 PB, Customer Transactions 11. Apache Hadoop Software Framework for large amounts ofunstructured data Apache-License Two main cores HDFS: Distributed data storage MapReduce: Distributed data handling 12. Hadoop ClusterData Node Data Node Data NodeData NodeData Node Data Node Data NodeData NodeData Node Data Node Data NodeData NodeData Node Data Node Data NodeData NodeData Node Data Node Data NodeData NodeData Node Data Node Data NodeData NodeData Node Data Node Data NodeData Node Data Node: 4-16 Cores, 4-16 Disks,8-64 GB RAM, 1-10GB Network 13. Hadoop DistributedFile SystemFileBlockBlock Block Block Block BlockBlockData NodeData NodeData Node 14. MapReduce DataRDBMSQuery DataHadoop Query 15. FeaturesHDFS MapReduce DistributionFault ToleranceScalability 16. Hadoop Eco System SQL ScriptsHBaseWhirr Hive Pig Oozie MapReduceAvroJava API HDFS eeper Zook Sqoop Flume ConnectorsHue RDBMSLogs...Mahout 17. Example of a Integration 18. Scope Successful Audits per ISO 27001 Analyze different Data Sources fromdifferent Data Bases and CRM Systems Realtime and Lifetime Statistics per Product Periodical Analytic and Statistic Jobs Weekly Re-Import into CRM Single Queries per User (Analyst) over aSecured GUI 19. Solution Path Cluster Authentication and Authorization viaKerberos and crypted data communication / DataProtection Sqoop Connector to CRM / DB Terradata, Oracle, Postgres, MySQL, MS SQL Hive - HBase Integration Hive Analytics, controlled automatically over OozieWorkload Orchestrator Hue Shell, Authentication via Kerberos SPNEGO 20. CRM Park Integration CDHAuthentication SqoopKerberos (AD, MITv5)Real TimeHBase Hive OozieAutomation Enduser HUE 21. How to Manage? 22. Cloudera Manager Automated Deployment Reporting Monitoring Support Integration Service Management Log Management Events and Alerts 23. Cloudera Founded 2009 in Palo Alto Clouderas Distribution Including Hadoop CDH4 / Cloudera Manager 4 > 320 employees worldwide Training, Consulting, Support, Development Enterprise Tools 24. Thank You! [email protected] Twitter: @mapredit Blog: http://hadoop.