CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft Research Jamison Collins, Hong Wang Microarchitecture Research Lab Intel Corporation David Brooks Engineering and Applied Sciences Harvard University International Symposium on Microarchitecture 11 November 2008 Benjamin C. Lee, et al 1 :: MICRO :: 11 Nov 08
27
Embed
CPR: Composable Performance Regression for Scalable ...people.duke.edu/~bcl15/documents/lee2008-micro-slides.pdf6 homeworld Homeworld, three-dimensional movement Multimedia 7 mentalray
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Statistical InferenceModels relationships between dataRequires initial data to train, formulate modelLeverages correlation from initial data for prediction
Regression ModelsLow training costs (sample 300 from 4.3B designs)Accurate inference (1.5% median error)Efficient computation (100’s of predictions per second)
Evolutionary ApproachOptimize ProcXDesign ProcY, enhancing ProcX with µ-arch featuresRe-construct models, accounting for µ-arch featuresOptimize ProcY
Case StudyConsecutive generations of x86 µ-archImprove FE (e.g., branch prediction)Improve MEM (e.g., prefetching)Improve OOO (e.g., memory disambiguation)
Quad-Core BenchmarksSet .1 .2 .3 .41 dense excel flash md22 video specjapp specweb tachyon3 excel homeworld audio unreal4 outlook encrypt halflife homeworld5 painter mentalray outlook encrypt
Benjamin C. Lee, et al 22 :: MICRO :: 11 Nov 08
MotivationUniprocessor
Multiprocessor
CPRModel EvaluationScalability
Multiprocessor Model AccuracyDual-core :: 6.6% median error
Quad-core :: 4.8% median error
Benjamin C. Lee, et al 23 :: MICRO :: 11 Nov 08
MotivationUniprocessor
Multiprocessor
CPRModel EvaluationScalability
Scaling TrendsLower bound CPR costs 0.33x of naïve costs
Approach lower bound as uniprocessor models built
Benjamin C. Lee, et al 24 :: MICRO :: 11 Nov 08
Conclusion
Inference in IndustryEffective inference for x86 µ-arch1.5% median errors relative to simulationEvolutionary design for new features across generations
Composable Performance RegressionLeverage core models to minimize CMP simulationsConstruct separate core, contention, penalty models4.8 to 6.6% median errors for dual-, quad-core0.33x training costs of prior approaches
Benjamin C. Lee, et al 25 :: MICRO :: 11 Nov 08
Future Directions
Efficient Multiprogramming AnalysisEvaluate combinations without modeling every combination
Multi-Threaded WorkloadsExtend for homogeneous, heterogeneous threads.Account for synchronization events
Many-Core ArchitecturesConstruct models without many-core simulatorsConsider other shared resources (e.g., network)