Page 1
AutomatedDebuggingInDataIntensiveScalableComputingSystems
MuhammadAliGulzar1,MatteoInterlandi3,Xueyuan Han2,Mingda Li1,TysonCondie1, and Miryung Kim1
1UniversityofCalifornia,LosAngeles2HarvardUniversity
3Mircrosoft
1
Page 2
2
Developlocally Hopeitworks Runincloud Bug!
Guesswork
BigDataDebuggingintheDark
Map Reduce
1 2 3 4
5
Page 3
3
MotivatingExample
• AlicewritesaSparkprogramthatidentifies,foreachstateintheUS,thedeltabetweentheminimumandthemaximumsnowfallreadingforeachdayofanyyearandforanyparticularyear.
ZipCode Date SnowFall99504 01/01/1994 245mm99504 01/01/1993 85mm90031 02/01/1991 0mm… … …
Page 4
ProblemDefinition
4
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in99504,03/01/1993,145mm99504,01/01/1994 ,245mm99504,01/01/1993 ,85mm90031,02/01/1991 ,0mm
AK, 01/01 ,[304.8,21336,245,85]AK, 03/01 ,[30.5,145]AK, 1992 ,[304.8,30.5]AK, 1993 ,[21336,145, 85]AK, 1994 ,[245]CA, 02/01 ,[0]CA, 1991 ,[0]
TextFile FlatMap GroupByKey Map Output
AK ,01/01,304.8AK ,1992 , 304.8AK ,03/01 ,30.5AK ,1992 ,30.5AK ,01/01 ,21336AK ,1993 , 21336AK ,03/01 ,145AK ,1993 ,145AK ,01/01 ,245AK ,1994 ,245
…. ….
AK ,01/01,21251AK ,03/01,114.5AK ,1992 ,274.3AK ,1993 ,21251AK ,1994 ,0CA ,02/01,0CA ,1991 ,0
Givenatestfunction,thegoalistoidentifyaminimumsubsetoftheinputthatisabletoreproducethesametestfailure.
def test(key:String, delta: Float) : Boolean = {delta < 6000
}
• Usingatestfunction,ausercanspecifyincorrectresults
Page 5
5
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in99504,03/01/1993,145mm99504,01/01/1994 ,245mm99504,01/01/1993 ,85mm90031,02/01/1991 ,0mm
AK, 01/01 ,[304.8,21336,245,85]AK, 03/01 ,[30.5,145]AK, 1992 ,[304.8,30.5]AK, 1993 ,[21336,145, 85]AK, 1994 ,[245]CA, 02/01 ,[0]CA, 1991 ,[0]
TextFile FlatMap GroupByKey Map Output
AK ,01/01,304.8AK ,1992 , 304.8AK ,03/01 ,30.5AK ,1992 ,30.5AK ,01/01 ,21336AK ,1993 , 21336AK ,03/01 ,145AK ,1993 ,145AK ,01/01 ,245AK ,1994 ,245
…. ….
AK ,01/01,21251AK ,03/01,114.5AK ,1992 ,274.3AK ,1993 ,21251AK ,1994 ,0CA ,02/01,0CA ,1991 ,0
ExistingApproach1:DataProvenanceforSpark
Itover-approximatesthescopeoffailure-inducinginputsi.e.recordsinthefaultykey-groupareallmarkedasfaulty
Page 6
ExistingApproach2:DeltaDebugging
• DeltaDebuggingperformsasystematicbinarysearch-likeprocedureontheinputdatasetusingatestoraclefunction
6
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in99504,03/01/1993,145mm99504,01/01/1994,245mm99504,01/01/1993,85mm90031,02/01/1991,0mm
AK,01/01,304.8AK,1992 , 304.8AK,03/01 ,30.5AK,1992 ,30.5AK,01/01 ,21336AK,1993 , 21336AK,03/01 ,145AK,1993 ,145AK,01/01 ,245AK,1994 ,245
…. ….
AK ,01/01,[304.8,21336,245,85]AK ,03/01 ,[30.5,145]AK ,1992 ,[304.8,30.5]AK ,1993 ,[21336,145, 85]AK ,1994 ,[245]CA,02/01 ,[0]CA,1991 ,[0]
AK , 01/01 ,21251AK , 03/01 ,114.5AK , 1992 ,274.3AK , 1993 ,21251AK , 1994 ,0CA, 02/01 ,0CA, 1991 ,0
TextFile FlatMap GroupByKey Map Output
1
2
Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators
Page 7
ExistingApproach2:DeltaDebugging
• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction
7
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in
AK,01/01,304.8AK,1992 , 304.8AK,03/01 ,30.5AK,1992 ,30.5AK,01/01 ,21336AK,1993 , 21336
AK ,01/01,[304.8,21336]AK ,03/01 ,[30.5]AK ,1992 ,[304.8,30.5]AK ,1993 ,[21336]
AK, 01/01 ,21031AK , 03/01 , 0AK , 1992 ,274.3AK , 1993 , 0
TextFile FlatMap GroupByKey Map Output
Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators
1
2
Run2
Page 8
ExistingApproach2:DeltaDebugging
• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction
8
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in
AK,01/01,304.8AK,1992 , 304.8AK,03/01 ,30.5AK,1992 ,30.5
AK ,01/01,[304.8]AK ,03/01 ,[30.5]AK ,1992 ,[304.8,30.5]
AK , 01/01 ,0AK , 03/01 , 0AK , 1992 ,274.3
TextFile FlatMap GroupByKey Map Output
Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators
Run3
Page 9
ExistingApproach2:DeltaDebugging
• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction
9
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in
AK,01/01 ,21336AK,1993 , 21336
AK ,01/01,[21336]AK ,1993 ,[21336]
AK , 01/01 ,0AK , 1993 ,0
TextFile FlatMap GroupByKey Map Output
Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators
Run4
Page 10
ExistingApproach2:DeltaDebugging
• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction
10
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in
AK,01/01,304.8AK,1992 , 304.8
AK ,01/01,[304.8]AK ,1992 ,[304.8]
AK , 01/01 ,0AK , 1992 ,0
TextFile FlatMap GroupByKey Map Output
Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators
Run5
Page 11
ExistingApproach2:DeltaDebugging
• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction
11
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in
AK,03/01 ,30.5AK,1992 ,30.5
AK ,03/01 ,[30.5]AK ,1992 ,[30.5]
AK , 03/01 , 0AK , 1992 ,0
TextFile FlatMap GroupByKey Map Output
Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators
Run6
Page 12
ExistingApproach2:DeltaDebugging
• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction
12
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in
AK,01/01 ,21336AK,1993 , 21336
AK ,01/01,[21336]AK ,1993 ,[21336]
AK , 01/01 ,0AK , 1993 ,0
TextFile FlatMap GroupByKey Map Output
Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators
Run7
Page 13
ExistingApproach2:DeltaDebugging
• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction
13
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in
AK,03/01 ,30.5AK,1992 ,30.5AK,01/01 ,21336AK,1993 , 21336
AK ,01/01,[21336]AK ,03/01 ,[30.5]AK ,1992 ,[30.5]AK ,1993 ,[21336]
AK, 01/01 ,0AK , 03/01 , 0AK , 1992 ,0AK , 1993 , 0
TextFile FlatMap GroupByKey Map Output
Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators
Run8
Page 14
ExistingApproach2:DeltaDebugging
• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction
14
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in
AK,01/01,304.8AK,1992 , 304.8AK,01/01 ,21336AK,1993 , 21336
AK ,01/01,[304.8,21336]AK ,1992 ,[304.8]AK ,1993 ,[21336]
AK, 01/01 ,21031AK , 1992 ,0AK , 1993 , 0
TextFile FlatMap GroupByKey Map Output
Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators
Run9
Page 15
AutomatedDebugginginDISCwithBigSift
15
Test PredicatePushdown
PrioritizingBackwardTraces
BitmapbasedTest
Memoization
Input:ASparkProgram,ATestFunction Output:MinimumFault-InducingInputRecords
DataProvenance+DeltaDebugging
Page 16
16
Optimization1: TestPredicatePushdown
Ifapplicable,BigSift pushesdownthetestfunctiontotesttheoutputofcombinersinordertoisolatethefaultypartitions.
• Observation: Duringbackwardtracing,dataprovenancetracesthroughallpartitionseventhoughonlyafewpartitionscontainfaultyintermediatedata.
Test
Test
Test
Test
Test
Test
Test
WithoutTestPushdown WithTestPushdown
Page 17
17
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in99504,03/01/1993,145mm99504,01/01/1994 ,245mm99504,01/01/1993 ,85mm90031,02/01/1991 ,0mm
AK, 01/01 ,[304.8,21336,245,85]AK, 03/01 ,[30.5,145]AK, 1992 ,[304.8,30.5]AK, 1993 ,[21336,145, 85]AK, 1994 ,[245]CA, 02/01 ,[0]CA, 1991 ,[0]
TextFile FlatMap GroupByKey Map Output
AK ,01/01,304.8AK ,1992 , 304.8AK ,03/01 ,30.5AK ,1992 ,30.5AK ,01/01 ,21336AK ,1993 , 21336AK ,03/01 ,145AK ,1993 ,145AK ,01/01 ,245AK ,1994 ,245
…. ….
AK ,01/01,21251AK ,03/01,114.5AK ,1992 ,274.3AK ,1993 ,21251AK ,1994 ,0CA ,02/01,0CA ,1991 ,0
Optimization2:PrioritizingBackwardTraces
Incaseofmultiplefaultyoutputs,BigSift overlapstwobackwardtracestominimizethescopeoffault-inducinginputrecords
• Observation:ThesamefaultyinputrecordmaycontributetomultiplefaultyoutputduetooperatorssuchasJoinorFlatmap
Page 18
18
Optimization3:BitmapBasedTestMemoization
Weuseabitmapbasedtestmemoization techniquetoavoidredundanttestingofthesameinputdataset.
• Observation:Deltadebuggingmaytryrunningaprogramonthesamesubsetofinputredundantly.
0
1
0
1
0
0
0
0
1
1
InputData Bitmap
✔
𝗫
TestOutcome
• BigSift leveragesbitmaptocompactlyencodetheoffsetsoforiginalinputtorefertoaninputsubset
Page 19
EvaluationQuestions
• RQ1:HowmuchimprovementinthedebuggingtimedoesBigSift provideincomparisontodeltadebugging?
• RQ2:HowlongisthedebuggingtimeofBigSift incomparisontooriginalrunningtimeofajob?
• RQ3:Howmuchimprovementintheprecisionoffault-inducinginputrecordsdoesBigSift provideincomparisontodataprovenance?
Page 20
RQ1:PerformanceImprovementoverDeltaDebugging
SubjectProgram RunningTime(sec) DebuggingTime(sec)
SubjectProgram Fault OriginalJob DD BigSift Improvement
Movie Histogram Code 56.2 232.8 17.3 13.5X
InvertedIndex Code 107.7 584.2 13.4 43.6X
RatingHistogram Code 40.3 263.4 16.6 15.9X
SequenceCount Code 356.0 13772.1 208.8 66.0X
Rating Frequency Code 77.5 437.9 14.9 29.5X
CollegeStudent Data 53.1 235.3 31.8 7.4X
WeatherAnalysis Data 238.5 999.1 89.9 11.1X
Transit Analysis Code 45.5 375.8 20.2 18.6X
BigSift providesuptoa66Xspeedupinisolatingtheprecisefault-inducinginputrecords,incomparisontothebaselineDD
Page 21
RQ2:DebuggingTimevs.OriginalJobTime
SubjectProgram RunningTime(sec) DebuggingTime(sec)
SubjectProgram Fault OriginalJob DD BigSift Improvement
Movie Histogram Code 56.2 232.8 17.3 13.5X
InvertedIndex Code 107.7 584.2 13.4 43.6X
RatingHistogram Code 40.3 263.4 16.6 15.9X
SequenceCount Code 356.0 13772.1 208.8 66.0X
Rating Frequency Code 77.5 437.9 14.9 29.5X
CollegeStudent Data 53.1 235.3 31.8 7.4X
WeatherAnalysis Data 238.5 999.1 89.9 11.1X
Transit Analysis Code 45.5 375.8 20.2 18.6X
Onaverage,BigSift takes62%lesstimetodebugasinglefaultyoutput thanthetimetakenforasinglerunontheentiredata.
Page 22
RQ2:DebuggingTimevs.OriginalJobTime
1
10
100
1000
10000
100000
1000000
10000000
1000000001E+09
0 2000 4000 6000 8000 10000 12000 14000
#offault-ind
ucinginpu
trecords
FaultLocalizationTime(s)
SequenceCount
DeltaDebugging BigSift
TestDrivenDataProvenance DataProvenance
Onaverage,BigSift takes62%lesstimetodebugasinglefaultyoutput thanthetimetakenforasinglerunontheentiredata.
Page 23
RQ3:FaultLocalizabilityoverDataProvenance
143796
6487290
520904
234115800
15003060
2554788
350
2
1350
15 13
1 1 1 1 1 12
1
10
100
1000
10000
100000
1000000
10000000
100000000
MovieHistorgram
InvertedIndex
RatingHistogram
SequenceCount
RatingFrequency
CollegeStudents
WeatherAnalysis
#offault-ind
ucinginpu
trecords
DataProvenance TestDrivenDataProvenance BigSift&DD
BigSift leveragesDDafterDPtocontinuefaultisolation,achievingseveralordersofmagnitude103 to107 betterprecision.
Page 24
Conclusion
• BigSift isthefirstpieceofworkinautomateddebuggingofbigdataanalyticsinDISC.
• BigSift provides103X– 107Xmoreprecisionthandataprovenanceintermsoffaultlocalizability.
• Itprovidesupto66XspeedupindebuggingtimeoverbaselineDeltaDebugging.
• Inourevaluationwehaveobservedthat,onaverage,BigSiftfindsthefaultyinputin62%lessthantheoriginaljobexecutiontime.