Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Post on 11-Aug-2014
789 Views
Preview:
DESCRIPTION
Transcript
Enabl ing Exploratory Analyt ics of Data in Shared-serv ice Hadoop Clusters PRESENTED BY Sagi Zelnick Principal Architect @ Yahoo and Ledion Bitincka Principal Architect @ Splunk Hadoop Summit June 2014 San Jose, CA
Overview
2 Yahoo Proprietary
! Hadoop @ Yahoo: 8+ years of innovation ! Hunk @ Yahoo: organization-wide investment for next 3+ years ! Yahoo providing Hunk as a self-service to explore, analyze & visualize data in HDFS
› Hunk allows for visually browsing very complex tables (250+ fields)
› Rapid prototyping for new jobs with almost instant results for searches, without having
to wait for the entire job/query to finish
› Cuts down on the development cycles by faster interaction with results
› Built-in graphs/charts makes for a powerful solution for many situations
About your speakers
3 Yahoo Proprietary
Sagi Zelnick Ledion Bitincka Principal Architect Principal Architect Yahoo Splunk
Hunk + Hadoop @ Yahoo
4 Yahoo Proprietary
5 Yahoo Proprietary
History of Hadoop innovation @ Yahoo
Over 600PB of Hadoop storage (over half an Exabyte)
6 Yahoo Proprietary
! Very large clusters used by many groups across the enterprise. ! More than 35,000 individual datanodes. ! Hadoop is provided as a service. ! Multiple cluster types such as research, dev, sandbox and production. ! Services such as HBase, Hive, Oozie, etc… ! Users are free to run jobs, but have resource constraints. ! Maintained by the Grid Operations Group.
Improving operational visibility with Hunk
! We pointed Hunk at many operational logs and event data we already had on the grid.
! This includes system metrics, HDFS ops, JVM stats and YARN metrics. ! Created instrumentation to measure usage per user and job. ! Analyzed terabytes of NameNode audit logs. ! Job history leveraged for visualizing usage/growth and historical views. ! Custom events for HBase statistics.
7 Yahoo Proprietary
Use Case Customer Benefits
System metrics from 35k nodes Grid Ops / Grid Customers
Identify slow tasks/nodes when debugging
Historical insights of resources All Grid Customers Track organic growth
Job performance All Grid Customers Improved job SLAs
HBase metrics All Grid Customers Track region/RS/table metrics…
Job logs in near real-time All Grid Customers / Ops Search for errors directly from the YARN logs
Namenode operational data Research, Dev Improved performance and stability
Tracking Hadoop performance and metrics in Hunk
8 Yahoo Proprietary
Measuring NameNode performance pre & post upgrades
9 Yahoo Proprietary
! Historical visualizations of all operations. ! Search data in Hunk from billions of NameNode events. ! Measure JVM and memory usage. ! Insights into operational performance.
Yahoo Proprietary
New Searchindex="simon_blue_new_all" this_cluster="dilithiumblue*" (log_subtype="DFS" #hdfs=hdfs) | timechart spa
n=1h avg(number*) as num_*
Last 7 days
✓ 10,086 events (5/15/14 1:00:00.000 AM to 5/22/14 1:36:34.000 AM)
_time
num_BlockReports num_CopyBl...perations num_HeartBeats num_ReadBl...perationsnum_ReadMe...perations num_Replac...Operations num_WriteB...Operations num_blockChecksumOp
Fri May 162014
Sun May 18 Tue May 20
200,000,000
400,000,000
600,000,000
_time ↕
num_BlockReports ↕
num_CopyBlockOpera
tions ↕
num_HeartBeats ↕
num_ReadBlockOpera
tions ↕
num_ReadMetadataOperati
ons ↕
num_ReplaceBlockOperat
ions ↕
num_WriteBlockOpera
tions ↕
num_blockChecksumOp ↕
2014-05-15 01:00 1124437.7359
02
46721126.819672
514957.3840
98
12930433.077869
0.000000 94210832.786885
63512425.967213
13975.306557
2014-05-15 02:00 1115496.2904
92
53597000.262295
298717.6370
49
10402176.717213
0.000000 94109944.655738
93916552.393443
35459.288689
2014-05-15 03:00 1110372.4173
56566721.704918
428494.9449
13296385.590164
0.000000 94141430.295082
97353478.229508
20307.549344
Visualization Visualization using Hunk
10
11 Yahoo Proprietary
New Searchindex="simon_blue_new_all" this_cluster="dilithiumblue*" (log_subtype="DFS" #hdfs=hdfs) | timechart spa
n=5m avg(number*) as num_*
Last 2 days
✓ 2,753 events (5/20/14 1:14:21.000 AM to 5/22/14 1:14:21.000 AM)
_time
num_BlockReports num_CopyBl...perations num_HeartBeats num_ReadBl...perationsnum_ReadMe...perations num_Replac...Operations num_WriteB...Operations num_blockChecksumOp
12:00 PMTue May 202014
12:00 AMWed May 21
12:00 PM
1,000,000,000
250,000,000
500,000,000
750,000,000
_time ↕
num_BlockReports ↕
num_CopyBlockOpera
tions ↕
num_HeartBeats ↕
num_ReadBlockOpera
tions ↕
num_ReadMetadataOperati
ons ↕
num_ReplaceBlockOperat
ions ↕
num_WriteBlockOpera
tions ↕
num_blockChecksumOp ↕
2014-05-20 01:15:00 1056047.0240
00
34677652.000000
124121.2640
00
26242490.800000
0.000000 88112292.800000
126478486.400000
51405.346000
2014-05-20 01:20:00 1055517.9240
00
30920700.800000
1065390.086
000
22756041.800000
0.000000 87745422.400000
92323387.200000
32070.482000
2014-05-20 01:25:00 1055457.2000
33068504.400000
27622.56200
11396610.700000
0.000000 88569211.200000
94593716.800000
28873.618000
Visualization
Sample troubleshooting in Hunk of 750 million events
12 Yahoo Proprietary
New Searchindex="simon_blue_new_all" this_cluster="dilithiumblue*" (log_subtype="JVM" ProcessName="NameNode") | tim
echart span=5m avg(Threads*) as threads_*
Last 2 days
✓ 8,463 events (5/20/14 12:00:00.000 AM to 5/22/14 12:00:00.000 AM)
_time
threads_Blocked threads_New threads_Runnable threads_Terminated threads_TimedWaitingthreads_Waiting
12:00 AMTue May 202014
12:00 PM 12:00 AMWed May 21
12:00 PM
200
400
_time ↕ threads_Block
ed ↕ threads_Ne
w ↕ threads_Runna
ble ↕ threads_Terminat
ed ↕ threads_TimedWait
ing ↕ threads_Waiti
ng ↕
2014-05-20 00:00:00 72.360000 10.638333 5.485833 0.000000 21.208333 78.555000
2014-05-20 00:05:00 70.177333 10.554667 5.277333 0.000000 20.744667 76.578000
2014-05-20 00:10:00 70.211333 9.998667 5.022000 0.000000 19.333333 73.766667
2014-05-20 00:15:00 70.300667 10.268000 5.156667 0.000000 17.488667 70.122000
2014-05-20 00:20:00 70.422667 10.376000 5.188000 0.000000 15.700000 66.611333
2014-05-20 00:25:00 70.444000 10.288000 5.144000 0.000000 14.089333 63.400667
Visualization
Big picture plus granular details
Analyzing NameNode RPC calls (troubleshooting)
13 Yahoo Proprietary
! Who is making what RPC call (open, listStatus, create, etc.). ! How often are they making these RPC calls. ! From which IP/host are they coming from. ! Search and visualize historical data from billions of events. ! Prevent NameNode abuse/misuse.
14 Yahoo Proprietary
Visualizing 834 million discrete events …
15 Yahoo Confidential & Proprietary
… continued
Queue insights (capacity & provisioning) ! Each Hadoop job runs in a specific queue. ! We track every aspect of the YARN framework. ! Immediate queue performance and configuration profiling via job
history server. ! Historical views and trends that enable better capacity management. ! Improved queue utilization and allocation management.
16 Yahoo Proprietary
New Searchindex="jobsummary_logs_all_red" cluster="dilithium*" | eval total_slot_seconds=(mapSlotSeconds + reduceSlotSec
onds) | eval gb_hours=((total_slot_seconds * 0.5) / 3600) | eval gb_hours=round(gb_hours) | timechart span=6h sum
(gb_hours) as gb_hours by queue
Last 7 days
✓ 1,175,726 events (5/20/14 8:00:00.000 PM to 5/27/14 8:26:26.000 PM)
200,000
400,000
600,000
_time ↕
OTHER
↕
apg_dailyhigh_
p3 ↕
apg_dailymedium
_p5 ↕
apg_hourlyhigh_
p1 ↕
apg_hourlylow_
p4 ↕
apg_hourlymedium
_p2 ↕
apg_p7
↕
curveball_larg
e ↕
curveball_me
d ↕
slingshot
↕
slingstone
↕
2014-05-20 18:00 4154
45512 7071 25643 12111 29664 3473
26547 14192 60875
45376
2014-05-21 00:00 19341
92661 18005 41008 22944 88115 10896
38648 8693 48186
87670
2014-05-21 06:00 21160
108137 38398 35627 14934 101925 24458
29269 14066 24344
47831
2014-05-21 12:00 24238
74849 22695 47431 17731 53673 17332
37079 14479 44873
96909
2014-05-21 18:00 5792
95449 2737 44214 20325 48339 10222
34390 4605 168593
24298
2014-05-22 00:00 10177
68048 12853 36921 23248 57740 16005
44138 9142 88121
34544
2014-05-22 06:00 12720
85048 21977 35870 15503 100364 7823
35179 8086 33973
19802
2014-05-22 12:00 5459
76489 13154 34703 11204 34877 20178
22631 40567 98 24250
2014-05-22 18:00 8169
38394 2211 49840 19977 52438 4050
38066 27973 49333
31312
2014-05-23 00:00 12898
117518 7354 36422 16426 52918 8179
28202 21798 79808
37078
2014-05-23 06:00 6572
105431 26941 48614 29159 120424 14317
26011 12433 16745
35928
Visualization
_time
Wed May 212014
Thu May 22 Fri May 23 Sat May 24 Sun May 25 Mon May 26
Search | Splunk 6.1.0 http://spbl103n01.blue.ygrid.yahoo.com:9999/en-US/app/search...
1 of 2 5/27/14, 3:20 PM
Visualizing queues
17 Yahoo Proprietary
Self-service job reports
18 Yahoo Proprietary
! Each job is unique and so are the map and reduce elements. ! How to start analyzing jobs? ! Historical job performance and profiling enables in-depth
performance tuning. ! Long terms historical views and trending of growth.
19 Yahoo Proprietary
cluster
↕
user
↕
queue
↕ jobName ↕ jobId ↕ status
↕ gb-hours ↕
run_mins
↕
cobalt
gmon
grideng
PigLatin:findRemoteHDFSFromAudits.pig job_1398982765383_315271
SUCCEEDED
108.00
33.07
cobalt
gmon
grideng
PigLatin:findRemoteHDFSFromAudits.pig job_1398982765383_312700
SUCCEEDED
104.00
37.37
cobalt
gmon
grideng
PigLatin:findRemoteHDFSFromAudits.pig job_1398982765383_309715
SUCCEEDED
88.00 29.83
cobalt
gmon
gridops
distcp: job_1398982765383_309921
SUCCEEDED
36.00 68.49
cobalt
gmon
gridops
SPLK_spbl103n01.blue.ygrid.yahoo.com_1401125953.2076_0 job_1398982765383_313570
SUCCEEDED
25.00 14.26
cobalt
gmon
gridops
nnaudit_DR_2014_05_25 job_1398982765383_308938
SUCCEEDED
25.00 15.43
cob g grid nnaudit_DB_2014_05_25 job_1398982765 SUCCE 24.00 18.07
New Searchindex="jobsummary_logs_all_blue" cluster="*" user="gmon" |
eval total_slot_seconds=(mapSlotSeconds + reduceSlotSeconds) |
eval gb_hours=((total_slot_seconds * 0.5) / 3600) |
eval gb_hours=round(gb_hours,2) |
eval runtime=(finishTime-submitTime)/1000 | stats sum(gb_hours) as gb-hours
avg(runtime) as run_mins
by cluster user queue jobName jobId status| eval run_mins=round(run_mins/60,2) | sort -gb-hours
Yesterday
✓ 4,871 events (5/26/14 12:00:00.000 AM to 5/27/14 12:00:00.000 AM)
Statistics (4,871)
20 Yahoo Proprietary
21 Yahoo Proprietary
22 Yahoo Proprietary
More data to tap into with the metastore / Hive sources
23 Yahoo Proprietary
! Using the metastore we can setup virtual indexes to any table(s) in Hive, without the need to define the schema up-front
! Visualize very complex tables (250+ fields) ! Rapid prototyping for new jobs with almost instant results for searches,
without having to wait for the entire job/query to finish ! Built-in aggregates and graphs/charts ! Accelerates development workflow by providing faster interaction with
data
... it’s not just logs we’re looking at
24 Yahoo Proprietary
Meet%Hunk% !
26%
Integrated%Analy4cs%Pla8orm%for%Diverse%Data%Stores%
Full%featured,!Integrated!Product%
Fast!Insights!!for!Everyone%
Works!with!What!You!Have!Today%
Explore% Visualize% Dashboards%
Share%Analyze%
Hadoop!Clusters! NoSQL!and!Other!Data!Stores!
Hadoop%Client%Libraries% Streaming%Resource%Libraries%
27%
Fast%Deployment%and%Configura4on%Just%point%at%Hadoop%• Cer4fied%integra4ons%to%all%major%Hadoop%distribu4ons%
• Choose%1stLgen%MapReduce%or%YARN%%
• Create%Virtual%Indexes%across%one%or%more%clusters%
• From%download%to%searching%data%in%<%60%minutes%
Connect%to%one%or%mul4ple%Hadoop%clusters%
YARN%cer4fied%
28%
Interac4ve%Search%and%Results%Preview%Rapidly%interact%with%data%• Powerful%Search%Processing%Language%(SPL™)%
• Ad%hoc%exploratory%analy4cs%across%massive%datasets%
• Preview%results%• No%fixed%schema%
• No%requirement%to%“understand”%data%upfront%
Search%interface%
Preview%results%
Drill%down%to%raw%data%
Pause%or%stop%MapReduce%jobs%
29%
Powerful%Dashboards%for%SelfLService%Analy4cs%
Interac4ve%Dashboards%and%Charts%• EasyLtoLuse%dashboard%editor%• Chart%overlay%• Pan%and%zoom%• InLdashboard%drill%down%• Embed%charts%and%dashboards%in%3rd%party%apps%
• Reuse%skills%with%Splunk%Enterprise%6.1%and%Hunk%6.1%
30%
Automate%Access%for%Rapid%Explora4on%Supported%File%Formats%• Text%files%• Sequence%files%%• RCFile%• ORC%files%• Parquet%
31%
RoleLbased%Security%for%Shared%Clusters%
PassLthrough%Authen4ca4on%• Provide%roleLbased%security%for%Hadoop%clusters%
• Access%Hadoop%resources%under%security%and%compliance%
• Integrates%with%Kerberos%for%Hadoop%security%
Business!Analyst%
MarkeNng!Analyst%
Sys!Admin%
Business!!Analyst!!Queue:!!
Biz!AnalyNcs%
MarkeNng!Analyst!Queue:!
MarkeNng%
Sys!!Admin2!Queue:!!Prod%
32%
Powerful%Developer%Environment%• Use%a%standardsLbased%web%framework%and%REST%API%%
• Customize%dashboards%and%UIs%with%Simple%XML,%JavaScript%or%Django%
• Choose%among%SDKs%%
• One%integra4on%for%both%Splunk%Enterprise%and%Hunk%
Build%Analy4csLRich%Big%Data%Apps%
33%
Explore,%analyze%and%visualize%data%in%one%integrated%pla8orm%
Point%Hunk%at%your%storage%clusters%and%explore%data%immediately%
Preview%results%as%MapReduce%jobs%run%and%accelerate%reports%with%no%fixed%schemas%
INTERACTIVE!SEARCH!
RICH!DEVELOPER!ENVIRONMENT!
Build%big%data%apps%using%standard%web%languages%and%frameworks%
FULL%FEATURED!ANALYTICS!
FAST!TO!DEPLOY!AND!DRIVE!VALUE!
FullLFeatured,%Integrated%Analy4cs%Pla8orm%
Quest ion/Comments? Sagi Zelnick – Pr incipal Archi tect Emai l : zelnicks@yahoo-inc.com Ledion Bi t incka – Pr incipal Archi tect Emai l : lb i t incka@splunk.com
top related