The Universal Open Source Enterprise Level Monitoring Solution Zabbix 4.0 and beyond What we may expect in the future
The Universal Open Source Enterprise Level Monitoring Solution
Zabbix 4.0 and beyondWhat we may expect in the future
3
Zabbix is a universal open source enterprise level
monitoring solution
Zabbix Team
4
3.0 LTS 3.2 3.4 4.0 LTS
5
Where we are?
ReleasedItem pre-processing
Dependent items
Maps and dashboards
Remote commands by Proxies
Elastic Search
3.0 LTS 3.2 3.4 4.0 LTS
6
Where we are?
Released Under development
4.0 LTS
7
A few major improvements of Zabbix 4.0 and why they
are important
Making problems independent
8
1
Problems and events
9
Triggers{HOST.NAME} has just been restarted
Problems*No problem name*
3.x
Slow: problems and events name are calculated on the fly
Problems and events
10
Triggers{HOST.NAME} has just been restarted
4.0
Fast: display as it is in “problems” and “events”
Problems Name: “Linux006 has just been restarted”
Better integration options
11
2
Real-time export of history and events
12
Zabbix Server
History file
ExportDir=/var/log/zabbix ExportFileSize=100M
JSON
Trends file
Events file
Better work flow
13
3
Acknowledgements
14
Message is mandatory No way to put message only No way just to close problem
3.xMonitoring -> Problems -> Ack
Advanced problem work flow
15
Message is optional
Operations are optional: - ACK - Change severity (!) - Close problem
4.0Monitoring -> Problems -> Update
New ways of monitoring
16
4
HTTP item type
17
Some use cases• Monitoring content of WEB application
• Getting data out of APIs, which are based on JSON/XML
• Access to HTTP header fields
• Server: Apache/2.4.1 (Unix)
18
19
Typical HTTP processing
HTTP check Pre-processing History
HTTP data processing
TEXT HTML JSON XML
BINARY
XPath JSONPath
Regex
Better interface
20
5
UI getting simpler
21
No Monitoring->Triggers anymore, use Monitoring->Problems
New widgets
22
and more!
23
We create an universal self-service monitoring
platform delivering business value
Self service
24
Getting most business value out of collected data
Give access to everyone: finance, analytics, sales, support, developers, customers, etc
Requires best user experience
Security and flexible user roles are important
Extreme flexibility
25
Collect any data
Pre-process and transform collected data in any way
Modules and webhooks for extending Zabbix
Choice of: OS, HW, database, programming languages
26
More platforms
Official packages for more hardware and cloud platforms
Modularity
27
28
3.x
29
+
Independent modules
4.x3.x
30
Single pane of glassCentral place to see and control monitoring of whole infrastructure
Central management of alerting
Event collection from various sources
Observing information from multiple Zabbix Servers
31
Unified dashboard and
alerting
Events
Events
EventsEvents
EventsEvents
Root cause analysis
32
Root cause analysis
33
• It gives a clear answer to the question “What is the cause of the problem?”
• It provides information about impact and importance
• Reduces recovery time (MTTR)
Root cause analysis
34
Trigger dependencies Event correlation
3.4
Root cause analysis
35
Trigger dependencies Event correlation
3.4 4.x
Automatic and manual relationship between problems Complex event processing (de-duplication, filtering, enrichment incl. AI & machine learning)
Enrichment
36
‣ Server B is not available
Root cause analysisDatacenter: Tokyo1 Class: Availability
Enrichment
37
‣ Server B is not available
Root cause analysisDatacenter: Tokyo1 Class: Availability
‣ Server B is not available
AI & ML, CMDB, Network topology,
service tree
Datacenter: Tokyo1 Class: Availability
Location: Rack4,32 Contact: Alexei
Service: Helpdesk HW: HP DL380
GEO: 12.459 34.34
External systems
38
‣ Server B is not available
‣ Server C is not available
‣ Datacenter Tokyo1 is not available
‣ Network is not available to Tokyo1
‣ Server B is not available
‣ Server C is not available
‣ 197 more problems ….
Root cause analysisDatacenter: Tokyo1 Class: Availability
Datacenter: Tokyo1 Class: Availability
Datacenter: Tokyo1 Class: AvailabilityRelationship between problems
39
‣ Server B is not available
‣ Server C is not available
‣ Datacenter Tokyo1 is not available
‣ Network is not available to Tokyo1
‣ Server B is not available
‣ Server C is not available
‣ 197 more problems ….
Root cause analysisDatacenter: Tokyo1 Class: Availability
Datacenter: Tokyo1 Class: Availability
Datacenter: Tokyo1 Class: Availability
‣ Datacenter Tokyo1 is not available (2 related problems)
‣ Server B is not available
‣ Server C is not available
Relationship between problems Correlation rules
Service as a first class citizen
40
41
Disk space
Oracle Database
CPU Network
Transaction processing
Java Middleware API
Ticket selling system
WEB Server
Our services
VOIPHelpdesk
42
Disk space
Oracle Database
CPU Network
Transaction processing
Java Middleware API
Ticket selling system
WEB Server
Our services
VOIPHelpdesk
Service: Oracle
System: Disk
Tag based linkageto problems
• Much easier maintenance: tag based linkage between problems and services
• Choice of service propagation rules (up, down)
• Visualisation: more widgets to display services (status, SLA)
• Alerting of service status changes
• Use service tree for problem correlation and impact analysis
43
Services
44
Metrics Problems
Value
Services
IT Infrastructure level More about technology
Business level About SLA and KPIs
More value for business
45
Disk space
Oracle Database
CPU Network
Transaction processing
Java Middleware API
Ticket selling system
WEB Server
Our services
VOIPHelpdesk
IT InfrastructureBusiness
A few announcements
46
New training programs
47
ZCU ZCS ZCP ZCECertified User Certified Specialist Certified Professional Certified Expert
Low Medium High Very high
Difficulty
new new
48
49
The Universal Open Source Enterprise Level Monitoring Solution
Thank you!
Some of the used icons made by Freepik from www.flaticon.com