Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015

Post on 12-Feb-2017

155 Views

Category:

Software

7 Downloads

Preview:

Click to see full reader

Transcript

DICE Horizon 2020 Project Grant Agreement no. 644869http://www.dice-h2020.eu Funded by the Horizon 2020

Framework Programme of the European Union

Monitoring in Big Data Frameworks

Gabriel IuhaszInstitute e-Austria Timisoara26 November 2015

Overview

o Introductiono Cloud Computing and Big Datao Monitoring Toolso Monitoring Requirements and Solutionso Conclusions

Introductiono Big Data in Cloud computing

o Volume, Velocity, Variety and Veracityo Cost Reduction, Rapid provisioning/time to market,

Flexibility/scalabilityo DevOps and Cloud

o Development and Operationso Communication, Collaboration, Integration,

Automationo DevOps Monitoring

o Measurement is a key aspect of DevOps

Big Data in Cloud Computing

o Challenges of Big Data On Cloudo Low Latency real-time data

oVirtualization overheadoMulti-tenancy overhead

o Scalabilityo Lack of RDBMS support

o Availabilityo Data integrity/privacy

Hadoop Ecosystem

Cloudera

HortonWorks

Monitoring Architectureo Cross layer monitoring of big data platformso Types of metrics are highly dependent on the type of the

application o Have to be decided on a platform/application basis

o Centralized Monitoringo All resource states are sent to a centralized monitoring servero Metrics are continuously polled from monitored components o Single point of failureo Lacks scalability

o Decentralized Monitoringo No single point of failureo Central authority is diffused

Toolso Hadoop Performance Monitoring UI

o Lightweight monitoring UI for Hadoop servero Uses Hadoop metrics (using Sinks)

o SequenceIQo Based on ELK stack and Docker containerso ElasticSearch can be easily scaled horizontallyo Logstash server on client side

o Gangliao Scalable distributed monitoring systemo Low per-node overheado Focused on System Metricso Gmond, gmetad and Web Front-end

Tools IIo Apache Chukwa

o Built on top of HDFSo Easily scalableo Potentially high overhead

o Hadoop Vaidyao Rule Based diagnostic tool for M/R jobso Performes post run results analysis

o Nagioso Plugin based architectureo Uses a centralized server to collect metricso Possible to create a hierarchical deployment

Requirementso Difficulties in cloud monitoring

o Scaleo Velocity or Timelinesso Constant changes

o The need for scalability and automationo Easy re-configurabilityo Lightweight metrics collectorso Identifying pertinent metrics

DICE Overview

Platform-Indep. Model

Domain Models

ContinuousValidation

ContinuousMonitoring

DataAwareness

ArchitectureModel

Platform-Specific Model

PlatformDescription

DICE MARTE

Deployment &Continuous Integration

DICE IDE

Big Data

QAModels

DICE Monitoring Platformo RESTful Web Service

o Used to deploy and configure all core/auxiliary componentso Used to query ElasticSearch

Exports metrics in: JSON, CSV, OSLC Perf. Mon 2.0 (RDF+XML)o Used for auto-scaling of monitoring solution

o ELK Stack o Extremely flexible/configurableo Horizontally scalableo Can except various input and output formatso ETL via Logstash server (filters) o Logstash-forwarder secure transmission (new Beats Data Shippers)o Visualization using Kibana4

o Collectd o Statistics collection daemono A lot of plugins available o Simple configuration

DICE Monitoring Platform II

DICE Monitoring Platform Scaled

DICE Monitoring Platform Variant

Conclusionso We have given a short overview of current monitoring

platforms Identified key requirements for Big Data Monitoringo Scaling, Autonomy, Timeliness o Automation via Chef recipes

o Presented the current Architecture of the DICE Monitoring Platformo Currently collecting from: HDFS, YARN, Spark, Storm, Kafkao In the near future: Cassandra possibly Trident

o Creating the full lambda architecture based anomaly detection platform o ElasticSearch used as serving layer

Thank You!

Questions?

top related