Support Apache Cassandra in Production · Anuj Wadehra . Architect & Cassandra SME . Ericsson R & D . Support APACHE Cassandra in Production
Post on 27-May-2020
14 Views
Preview:
Transcript
Anuj Wadehra Architect & Cassandra SME Ericsson R & D
Support APACHE Cassandra in Production
Support Apache Cassandra in Production | Public | © Ericsson AB 2017 | 2017-04-30 | Page 2
› Building In-House Support › Case Study › Challenges › Best Practices › How We Fixed It
AGENDA
BUILDING IN-HOUSE SUPPORT
Support Apache Cassandra in Production | Public | © Ericsson AB 2017 | 2017-04-30 | Page 4
Define Scope – 24*7 On-Call Support – Hot Fixes – Consultancy – Community Contributions
Identify Vision – Faster & Predictable
Turnaround Time – Cost-Effectiveness – Open Source Contributions
BUILDING IN-HOUSE SUPPORT
Support Apache Cassandra in Production | Public | © Ericsson AB 2017 | 2017-04-30 | Page 5
– How to learn?
› Documentation › Trainings › Community › Product issues › PoCs
Build Competence
– What to learn ? › Cassandra › Programming › Troubleshooting › Product
BUILDING IN-HOUSE SUPPORT CONTD..
Support Apache Cassandra in Production | Public | © Ericsson AB 2017 | 2017-04-30 | Page 6
Setup Infrastructure – Continuous Integration setup – Test Lab for reproducing defects – R&D Lab for PoCs and new Feature Evaluation – Collaboration tools
› Wiki › Mailing Lists › Forums etc.
BUILDING IN-HOUSE SUPPORT CONTD..
Support Apache Cassandra in Production | Public | © Ericsson AB 2017 | 2017-04-30 | Page 7
Cassandra In-House Support X Technologies is a product based organization which has decided to replace RDBMS with C* in 3 product offerings. The decision is expected to positively impact scalability, fault-tolerance & availability of these products. Moreover, the organization aims to cut its expenditure on licensing and expensive RDBMS support.
CASE STUDY
Support Apache Cassandra in Production | Public | © Ericsson AB 2017 | 2017-04-30 | Page 8
CASE STUDY CONTD..
Proposed solution:
Challenges
Support Apache Cassandra in Production | Public | © Ericsson AB 2017 | 2017-04-30 | Page 10
– Selling NoSQL concepts › Why Repair ? › How deleting data can
increase disk space ? › How RF determines fault-
tolerance ?
Our Biggest Challenges – Keeping pace with Releases – Support for EOL versions – Insufficient/outdated docs – Lengthy upgrades – Compaction strategies
CHALLENGES
Support Apache Cassandra in Production | Public | © Ericsson AB 2017 | 2017-04-30 | Page 11
– Be Updated
› Mailing Lists › JIRAs
Addressing Challenges – Stability over fancy features – Upgrades aligned with EOL – Well tested Compaction
Strategies – CI Infrastructure for Patches – Well documented
› NoSql concepts › Operational procedures
CHALLENGES CONTD..
Support Apache Cassandra in Production | Public | © Ericsson AB 2017 | 2017-04-30 | Page 12
Best PRACTICES C* Operations
Support Apache Cassandra in Production | Public | © Ericsson AB 2017 | 2017-04-30 | Page 13
› Monitoring – Repair – Dropped Mutations – All time blocked – Read/Write Latencies – GC Pauses – Disk Usage – Upgrade sstables
› Daily Maintenance – Backup – Clearing Snapshots
› Repair – Be Selective – Repair once:
› Primary Range › Incremental
BEST PRACTICES
Support Apache Cassandra in Production | Public | © Ericsson AB 2017 | 2017-04-30 | Page 14
› Security – SSL for JMX – Encryption
› Automate – Scaling/Down Sizing – Multi-DC setup – Upgrades – Backup – Restoration – Clearing Snapshots – Repairs
BEST PRACTICES
Support Apache Cassandra in Production | Public | © Ericsson AB 2017 | 2017-04-30 | Page 15
HOW WE FIXED IT C* Issues & Solutions
Support Apache Cassandra in Production | Public | © Ericsson AB 2017 | 2017-04-30 | Page 16
› GC Pauses – Faster Memtable flushing – Aggressive compactions – JVM Tuning
› Slow Table Scans – Spark
HOW WE FIXED IT&
Support Apache Cassandra in Production | Public | © Ericsson AB 2017 | 2017-04-30 | Page 17
› Hung Repair – Firewall timeout – TCP Keep alive
› Disk Crunch – Aggressive compaction – LCS
HOW WE FIXED IT&
Support Apache Cassandra in Production | Public | © Ericsson AB 2017 | 2017-04-30 | Page 18
› Performance & OOM – Disable THP/Zone Reclaim – Max map count
› Wide Rows – Divide into buckets – Add buckets when needed
HOW WE FIXED IT&
QUESTIONS ?
top related