The Last Mile: Why Hadoop Management Is Critical to Success · Automated and reliable data ingest • Capture and manage relevant metadata • Preserve original source data where
Post on 24-May-2020
1 Views
Preview:
Transcript
#TDPARTNERS16 Sept 11,2016 GEORGIA WORLD CONGRESS CENTER
The Last Mile:Why Hadoop Management Is Critical to Success
Ron Bodkin and Scott Fleming
Think Big, a Teradata company
The Last Mile
• The open source ecosystem for analytics is complicated
• It’s easy to get started• Maintaining an optimal, performant environment is
not• Success depends on careful planning and
management
Data Lake Design Principles
• Automated and reliable data ingest • Capture and manage relevant metadata• Preserve original source data where possible• Provide cleansing, aggregation, and integration matched
to each use• Balance governance and agility• Implement security at the right time• Easily search, access, and consume data• Make the data ready for analysis
New Data Sources
• It all starts here• Capture the rawest form• Determine how it will be used and who will be using it• Cleanse it, validate it and profile it• Make it discoverable (and useful) • Bottom line: Be consistent and consider tools
Typical Data Ingestion
Governance
• Clear distinction of roles and responsibilities for curating data
• Common vocabulary for data sets / types• Implement required security – not too much, not too little• On-going data quality polices• Data retention / archival policies
Security Challenges
• Residual files following failed jobs• Compatibility of security tools with major Hadoop
distributors• Multiple types of discoverable data in the environment• BI and analytics user access• Lack of mature security tools• Uncontrolled replication of data• User authentication and authorization is complex
Without considering comprehensive security measures, your valuable data could be easily compromised and you may be a subject to security breach
Security LayersRedo this image
Ingestion Jobs and Monitoring
• Baseline job performance and resource requirements• Ensure error handling is robust• Build alerting into the processes that submit jobs• Develop and monitor SLAs for job performance. Look
for drift.• Leverage tools where possible
Resource Contention
• SLAs and sandboxes – often in the same environment
• Leverage the capacity scheduler and hierarchical queues
• Don’t be afraid to get granular• Use YARN containers – be prudent about the
resources requested
Capacity Planning
• Capacity planning is an on-going effort, not one and done
• Includes storage, compute, network, memory and real estate
• Review resource and storage utilization at least monthly
• Implement retention and archiving processes where appropriate
• Be thoughtful and plan when expanding• Just adding nodes can have unexpected
results
Hive Operations
• Bring your own data – User Education• Sub-optimal storage formats
• Table proliferation• Over partitioning• ODBC / JDBC Connectivity
• Canary processes for Hive Server 2• Impala – compute stats
General Hadoop Operations
• Develop a RACI for operations• ITIL Processes – minimally Release Management
and Change Management• Stay aligned with the distro versions• Use configuration management tools like Puppet
and Ansible• Staff appropriately
Hadoop Operations Top 10
1. Continuous Capacity Planning2. Isolate the LAN3. Implement proactive monitoring and alerting4. Establish data balancer schedule and use5. Periodic review of Hive tables, schemas and data storage6. Monitor for small files7. End user education8. Periodic review of the capacity scheduler and resource
management9. Monitor SLAs for drift10.Runbook, Runbook, Runbook
Monitoring
• Ambari / Cloudera Manager – basic blocking and tackling• Nagios – where there are gaps• PCNG – for application monitoring• Dr. Elephant – for application heuristics
Engineering and Operations
• Weekly reviews for alignment and planning• Include operations in engineering design• New technology preparation, planning and
training• Continuous updates to the runbooks• DevOps and Agile – rules of the road to be able to
fail fast while maintaining a stable environment
Monitoring Adoption
• Knowing who is doing what in the environment is essential to maintenance and planning.
• Determine who the power users are and make them champions
• Helps to understand resource planning and allocation
Summary
• Getting started is easy• Getting started to ensure long term success takes
some planning• There is a lot to stay on top of to ensure successful
operations• The platform components and tools vary in every
environment• Capable operations people are hard to find• Proactive management and monitoring is key to
happy users
Thank You
Questions/CommentsEmail:
Follow MeTwitter @
Rate This Session #with the PARTNERS Mobile App
Remember To Share Your Virtual Passes
ron.bodkin@thinkbiganalytics.com
Ronbodkin and @scottbfleming
653
19
top related