Spectrum Scale Performance Tools Deployment at Nuancefiles.gpfsug.org/presentations/2016/anl-june/SSUG_Nuance... · 2016-06-11 · © 2016 Nuance Communications, Inc. All rights reserved.

Post on 09-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

© 2 0 1 6 Nu a nce Co mmu nica tio ns, In c. A ll r ig ht s rese rv ed.

Spectrum Scale Performance ToolsDeployment at NuanceBob Oesterlin, Sr Storage EngineerNuance HPC GridJune 10th, 2016

Robert.Oesterlin@nuance.com

© 2016 Nuance Communications, Inc. All rights reserved. 2

– Quick Overview of Nuance and HPC Grids– Performance GUI – Experiences and Limitations– Spectrum Scale Performance Tools – Deployment– Collector Sizing/Federation– Dashboards using the Zimon-Grafana Bridge– What’s next

Topics

© 2016 Nuance Communications, Inc. All rights reserved. 3

Reinventing the relationship between people and technology— Defining the next generation of

human-computer interaction: Intelligent Systems

— Deeply invested in creating effortless and natural user experiences

— Best known for rapidly advancing voice-recognition technology

© 2016 Nuance Communications, Inc. All rights reserved. 4

Nuance Natural Language FrameworkThe engine that drives Intelligent Systems

Anything with George Clooney on tonight?”

Yes, I’ve found three shows, one of which is starting in just a few minutes.”

© 2016 Nuance Communications, Inc. All rights reserved. 5

– Supports the Worldwide Nuance R&D Community– Approximately 2000 users– ~7500 TB per day of data processed

– 85% Read, 15% Write– ~20,000,000 jobs processed per month– Over 6 PB of Elastic Storage across multiple clusters– On-premise Object storage, 4+ PB– VMs for casual access/job submission

Nuance HPC Grids

© 2016 Nuance Communications, Inc. All rights reserved. 6

– Large number of locally written tools– Collectl for system stats (CPU, disk, network, etc)– Periodic mmpmon collections feeds local database– Scripts to track RPC waiters– Dashboards based around Grafana

Performance data collection - legacy

© 2016 Nuance Communications, Inc. All rights reserved. 7

– Part of all releases since 4.1.1– Integrated with Spectrum Scale– Wide variety of metrics, both system and GPFS– New metrics being added (RPC waiters)– Integrates with Spectrum Control

SS Performance Sensors (aka “zimon”)

© 2016 Nuance Communications, Inc. All rights reserved. 8

– Provides access to all key SS performance metrics– Early beta participant– “Fairly” easy deployment– RH 7 dependency proved to be a challenge; current grids are all

RH 6.6 based– Table provide good overview of overall performance– Better for my Ops team than Engineering– Graphs – problematic in large clusters

IBM Performance GUI

© 2016 Nuance Communications, Inc. All rights reserved. 9

IBM Performance GUI

© 2016 Nuance Communications, Inc. All rights reserved. 10

– Using the default sensor configuration in large cluster is a bad idea– Deployment with federated (multiple) collectors– Which sensors drive the GUI?

Sensor Deployment - Problems

© 2016 Nuance Communications, Inc. All rights reserved. 11

– Default configuration is perfect for small environments– Collector memory requirements grow quickly– Difficult to retain large numbers of frequently collected metrics– Keep an eye on scaling:

– 500 NSDs * 500 nodes * 16 metrics = 4 million!– Example:

– 500 nodes, 500 NSDs, 16 file systems, 7 days of 1/min data = 66GB collector memory

Collector Sizing and deployment

© 2016 Nuance Communications, Inc. All rights reserved. 12

– IBM GUI is a great start – but limited, especially on larger clusters– Zimon-grafana bridge code by Metin Feridun @ IBM ZRL

– Provides Open TSDB Interface to IBM zimon data– Simple python script, runs on collector node, lightweight– All collected zimon metrics are available– Easy to construct complex/custom dashboards

– Distribution…– IBM Developerworks?

Grafana with Zimon

© 2016 Nuance Communications, Inc. All rights reserved. 13

Sample Grafana Dashboards

© 2016 Nuance Communications, Inc. All rights reserved. 14

Sample Grafana Dashboards

© 2016 Nuance Communications, Inc. All rights reserved. 15

Sample Grafana Dashboards

© 2016 Nuance Communications, Inc. All rights reserved. 16

– SS 4.2.1 Upgrade– RPC Waiter metrics in zimon– Cloud Tiering

– Consolidation of Grids– Combine Compute/NSD Clusters– Consistent deployment architecture

– Move from CNFS to CES

What’s Next

© 2 0 1 6 Nu a nce Co mmu nica tio ns, In c. A ll r ig ht s rese rv ed.

Thank you

top related