Top Banner
Highly Scalable Cognitive Storage Management Platform Using Cloud Native Services Ramakrishna Vadla, IBM Maneesh Rapelly, IBM Acknowledgement : Sumant Padbidri, Anbazhagan Mani
13

Highly Scalable Cognitive Storage Management Platform ... · Storage Management Evolution Storage Storage Storage Manager Storage Manager Storage Manager üEach storage device type

May 20, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Highly Scalable Cognitive Storage Management Platform ... · Storage Management Evolution Storage Storage Storage Manager Storage Manager Storage Manager üEach storage device type

Highly ScalableCognitive Storage Management Platform

Using Cloud Native Services

Ramakrishna Vadla, IBMManeesh Rapelly, IBM

Acknowledgement : Sumant Padbidri, Anbazhagan Mani

Page 2: Highly Scalable Cognitive Storage Management Platform ... · Storage Management Evolution Storage Storage Storage Manager Storage Manager Storage Manager üEach storage device type

Agenda

• Server Downtimes and Consequences

• Storage Management Evolution

• Next Generation Storage Management Platform

• Cognitive (AI) Storage Management Platform

• Predictive Analytics

• Scalability

Page 3: Highly Scalable Cognitive Storage Management Platform ... · Storage Management Evolution Storage Storage Storage Manager Storage Manager Storage Manager üEach storage device type

Server Downtimes and Consequences

Average cost per hour of enterprise server downtime worldwide in 2017 and 2018

Storage Administrator

source: statista.com

Page 4: Highly Scalable Cognitive Storage Management Platform ... · Storage Management Evolution Storage Storage Storage Manager Storage Manager Storage Manager üEach storage device type

Storage Management Evolution

Storage Storage Storage

Manager

Storage Storage

Manager

Storage Storage

Manager

ü Each storage device type has it own management module

ü Challenges• No consolidated view of the storage• Management complexity – Login to multiple consoles to monitor

the devices• Difficult to debug the problems those are part of the other devices

ü Consolidated view of all the storage devices including third party devices

ü On-premise deployment on dedicated serverü Challenges

• Dedicated resources for deployment• Support issues – turn around time is more to debug the issues• No information about the other deployments• Scalability is the challenge• High TCO• Running predictive analytics

Storage Manager - Per Device Type Storage Manages - All Devices

Manager Manager

Page 5: Highly Scalable Cognitive Storage Management Platform ... · Storage Management Evolution Storage Storage Storage Manager Storage Manager Storage Manager üEach storage device type

Next Generation Storage Management Platform

Collection Service

MetaDataStorage LogAnalytics

Data LakeML/DL service

Client Data Center Client Data Center

Meta Data Collection Service

Meta Data Collection Service

Internet

Performance Data Service

Inventory Data Service

Messaging Service

üDeploy thin Meta Data Collection Service in client data center that connects to storage devices

üRun all the data processing micro services on the cloudü Supports thousands of tenants with less resourcesüHighly Scalable and reliable using cloud auto scale feature

• Horizontally• Vertically

ü Processing of billions of metrics per minuteüRecover from site disasters (DR)ü Secure – data in motion, data at rest, RBACüData lake based on NoSQL such as Cassandra deployed on

the cloud.ü Predictive analyticsü Proactive support - faster time to resolutionüDifferent roles of the organization can view the same

details

Send data to cloud

Page 6: Highly Scalable Cognitive Storage Management Platform ... · Storage Management Evolution Storage Storage Storage Manager Storage Manager Storage Manager üEach storage device type

Cloud Native Services Based Architecture

Collection Service/

Container

MetaDataStorage/

COS

LogAnalytics/ELK

Data LakeCassandra/DynamoDB

ML/DL service

Client Data Center Client Data Center

Meta Data Collection Service

Meta Data Collection Service

Internet

Performance Data Service

Inventory Data Service

Messaging Service/

Kafka

üMicroservice based architectureü Data Services – Kubernetes and Containers

• Highly scalable using advanced auto scale features• High Availability and reliability

ü Lambda/Cloud functions – Used for small repetitive tasks that can be processed in less time

üData Lake – No SQL such as Cassandara database, AWS DynamoDB, Azure datalake

üMeta Data Storage – Object storages such as IBM Cloud Object storage, AWS S3, Azure object store

üMessaging Service - Kafka-as-a-service platform from IBM Cloud, AWS Streaming service

üLogAnalytics – Elasticsearch (ELK) service from Cloud -Elastic, IBM Cloud, AWS

üML/DL service using IBM Watson/ Amazon Sagemaker/MS Azure ML

Send data to cloud

Page 7: Highly Scalable Cognitive Storage Management Platform ... · Storage Management Evolution Storage Storage Storage Manager Storage Manager Storage Manager üEach storage device type

Cognitive (AI) Storage Management Platform

ü Predict Data Traffic issues – high response times/declines in throughput

• Noisy neighbor (Correlation Analysis)• Slow responding Hosts (Correlation Analysis)

Ø Analyze patterns and correlate with other customer datasetsØ Performance prediction in heterogeneous environmentsØ Tracking of known issues - Learn from other customer issues -

(Classification)Ø Classify the workload types based on the performance data

patternsØ Predictive Analytics

Ø Capacity Forecasting – (Regression)Ø Power consumption in data centers – (Regression)

Ø Performance anomaly detection Ø Performance metrics analysis (Time-series data analysis)Ø Automated Triaging and Root Cause Analysis (Classification)Ø Log analysis - (Clustering)

Ø Configuration best practices recommendationsØ Manual upgrades/Automated upgradesØ Configuration validation to avoid interruptions in service

Ø Intelligent Performance TuningØ Monitoring and improving SLAs

Data Center

Data Center

Data Center

DC

DC

DC

Send data to cloud

IBM Watson AI Services

Failures• Device Failures• Network Failures• Protocol Failures• Application Failures

Page 8: Highly Scalable Cognitive Storage Management Platform ... · Storage Management Evolution Storage Storage Storage Manager Storage Manager Storage Manager üEach storage device type

Data Lake Preprocessing Performance Data

Performance Counters Feature Extraction

Performance Counters Dataset

Run correlation analysisPearson, Kendall, & Spearman

Evaluate with test

data

Production DeploymentALERT

NOTIFICATION

ComponentsHostPoolVolumePort

Ø Goal – Find a host that causes data traffic issues - high response times/decline in throughput

Hostname Counters with high correlation

AI Based Predictive AnalyticsPredict Data Traffic Issues – High response times/Declines in throughput

Page 9: Highly Scalable Cognitive Storage Management Platform ... · Storage Management Evolution Storage Storage Storage Manager Storage Manager Storage Manager üEach storage device type

Ø Goal – Find a host that causes data traffic issues - high response times/decline in throughput

Slow responding hosts Noisy neighbor• FC port buffer is not utilized properly by hosts• Difficult to find the host.• Host with highest correlation is the culprit

Ports

Hosts

Volume

Correlation using Heatmap

Pool

Correlation using Heatmap

• Extremely busy volumes create problems for other volumes in the cluster

AI Based Predictive AnalyticsPredict Data Traffic Issues – High response times/Declines in throughput

Host

Page 10: Highly Scalable Cognitive Storage Management Platform ... · Storage Management Evolution Storage Storage Storage Manager Storage Manager Storage Manager üEach storage device type

Configuration & Log Analytics

Client Data Center Client Data Center

Meta Data Collection Service

Meta Data Collection Service

üConfiguration Analyticsü Different versions of storage devices deployment reportü Total amount of storage (PB) deployed across the customersü Different type of storage devices deployedü No. of devices deployed across geoü Customers require upgrades

üLog Analyticsü Errorsü Warning

Send meta data to cloud

Elastic - Opensource distributed real time data searchand analytics index based database engine with schema freeJSON documentsLogstash – Ship logs from any source, parse them, get theright timestamp, index them, and search them.Kibana – Data visualization engine allows to natively interactwith data via custom dashboards

Page 11: Highly Scalable Cognitive Storage Management Platform ... · Storage Management Evolution Storage Storage Storage Manager Storage Manager Storage Manager üEach storage device type

Scalability based on Cloud native micro services – Kubernetes & containers

Worker node

Performance Data Service

Deployment

Performance Data Service pod2

App

Performance Data Service Pod 1

App

Worker node

Performance Data Service

Deployment

Performance Data Service Pod2

App

Performance Data Service Pod1

App

DeploymentMessaging pod

App

Messaging service

Worker node

Performance Data Service

Deployment

Performance Data Service pod

App

ü Application expected to be available 24/7, Frequent Deployment of new versions

ü Containers help avoiding downtime

ü Kubernetes does container orchestration by managing pods

ü Pods can control one or more containers

ü Replica sets responsible for specified number of active pods during scale out or in

ü Deployment controller changes the actual state to desired state

ü Service is an abstraction which defines logical set of pods and policy to access them

Highly Scalable Platform

Kubernetes Cluster

One deployment with single pod

One deployment with single pod Deploy with multiple pods

Page 12: Highly Scalable Cognitive Storage Management Platform ... · Storage Management Evolution Storage Storage Storage Manager Storage Manager Storage Manager üEach storage device type

• Scale a deployment to fixed number of replicas: --replicas=10

• Horizontal pod autoscaling • --min –max –cpu-percent

• Proportional scaling:• support running multiple

versions of an application at the same time

• When rolling update is in progress, balances the additional replicas in the existing active ReplicaSets.

• Exposing the service:• NodePort• Load Balancer• Kubernetes Ingress

Scalability based on Cloud native micro services – Kubernetes & containers

Highly Scalable Platform

Page 13: Highly Scalable Cognitive Storage Management Platform ... · Storage Management Evolution Storage Storage Storage Manager Storage Manager Storage Manager üEach storage device type

Q & AThank You