1 Architectural Musings Rethinking Computer Systems Architecture Christopher Vick [email protected] June 3, 2012
Mar 31, 2015
1
Architectural MusingsRethinking Computer Systems Architecture
Christopher [email protected]
June 3, 2012
2
Vision Talk
Mobile computing and current technologies fundamentally change key parameters and constraints for computer system architecture
Vast new opportunities for research of great interest to and great relevance for industry
Introduction
3
Outline Computer System Architecture Then (Circa 1970)
Scarce Resources & Bottlenecks Optimizations
Now (Mobile Computing Platforms) Scarce Resources & Bottlenecks Optimizations?
Qualcomm Research Questions?
4
COMPUTER SYSTEM ARCHITECTURE
5
Computer System Architecture Hardware
The 5 classic components (Patterson & Hennessy) Input, Output, Memory, Datapath, Control
Software System Virtual Machine (Hypervisor, VM, or VMM) Operating System Compilers & Tools
Definitions The way components fit together The arrangement of the various devices in a complete computer
system or network The instruction set plus a model of the execution of the
instruction set (Amdahl et al)
Computer System Architecture The selection and combination of hardware and software
components to assemble an effective computer system
6
Combination
7
Effective An optimization problem
Many variables Selection of hardware/software components Selection of interfaces/interconnects
Many constraints Physical, sociological, technical & cost constraints
Scarce Resources and Bottlenecks Maximize utilization of scarce resources Minimize impact of bottlenecks
8
THEN(CIRCA 1970)
9
Scarce Resources CPU Cycles
CPUs expensive Slow clock rates
Memory Locations Random Access Memory expensive Address/Data paths into CPU expensive
Skilled Programmers Relatively new discipline Poor language and tools support
10
Bottlenecks Programmer Productivity
Software development slow and expensive Low level programming paradigms
Memory Latency RAM latency gated overall speed (~2-3 MHz) Small RAM backed by vastly slower storage
I/O Bandwidth Limited CPU connectivity Crude communication mechanisms
11
Optimizations Time Sharing
Effective sharing of limited resource
Virtual Memory Effective sharing, and backing with cheaper alternative
Hardware Improvements Smaller features provide more resource and faster clock Large Scale Integration Better signaling to improve bandwidth
High Level Programming Languages Broadens productive programmer community Abstracts away some hardware complexity
12
Examples Digital PDP 11
16-bit address space Orthogonal instruction set Memory mapped I/O Unix, DOS, many others
IBM System 370 24-bit address space Virtual Memory VMS, VM/370, DOS/VS Backward compatibility with System 360
13
NOW(MOBILE COMPUTING)
14
Scarce Resources Energy
Fixed Energy Budget for mobile devices Thermal issues at all scales Tradeoff between performance and energy Shrinks no longer significantly improving consumption
Memory Bandwidth Providing bandwidth is expensive Memory interconnect consumes significant energy
15
Bottlenecks Memory Latency
Increasing gap between CPU speed and DRAM latency Physical distance to DRAM devices a factor
Concurrency Shortage of programmers who can handle this Inadequate language/tools support
I/O Bandwidth/Latency Wireless bandwidth lower than wired Consumes large amounts of energy
16
Example HTC One
Processor: 1.5 GHz Dual Core Qualcomm MSM8960 OS: Android™ 4.0 (ICS) Memory RAM: 1 GB DDR2 Memory Storage: 16 GB onboard storage Display: 4.7" HD super LCD 1280 x 720 Network: LTE CAT3 - DL 100 /UL 50 LTE: 700/AWS
WCDMA: 2100/1900/AWS/850 EDGE: 850/900/1800/1900
Battery: 1800 mAh Camera (Main): 8 MP, f/2.0, BSI, 1080p HD Video
(Front): 1.3 MP with 720p video Dimensions: 134.8 x 69.9 x 8.9mm
This is a General Purpose Computer!
17
Optimizations? Multi-core
Aggressive addition of cores and threads Hardware concurrency outstripping software New Concurrent Programming Models/Tools?
Memory Subsystem Significant contributor to total energy consumption Adding bandwidth is expensive New technologies addressing some energy issues
Wireless bandwidth enhancements (LTE Advanced,etc.)
Solutions from desktop/server or embedded worlds may not directly apply in mobile space!
18
Memory System Energy Retaining data (one second)
DRAM: ~1-10 pJ/bit self-refresh SRAM: 1200+ pJ/bit, and rising over time [ITRS 2009]
4 pJ/bit (45nm LP, standby) [Barasinski et al., ESSCIRC ‘08] Flash, PCM, STT RAM…: Zero !
Moving Data 32-bit value:
Recompute: 60 pJ (Razor) Send 1mm: 10 pJ Retain in cache for 1 ms: 38 pJ Retain in DRAM for 1 second: 32+ pJ
19
Move less! Caches physically close to CPU Locality, locality, locality (the first rule of chip real estate)
Retain less! Power off unused caches lines [Kaxiras et al., ISCA ‘01] “Drowsy” caches [Flautner et al., ISCA ‘02] … with compiler analysis
[Zhang et al., Trans. Emb. Comp. Sys. 4(3) 2005] Don’t refresh unused DRAM … e.g. with garbage collection [Chen et al., CODES+ISSS ‘03]
Reducing Memory System Energy
20
Maintaining the illusion of a single flat memory address space is too expensive On-chip caches can be major consumers of area and energy Coherence protocols are expensive and difficult to scale
• Alternative: software-managed memory hierarchies– Tightly-coupled memory (TCM), scratchpads
– Do not require tag memory, address comparison logic
– More area- and energy-efficient
– Help bridge gap between bandwidth and throughput
Extending the Memory Model
21
Different programming paradigm: software explicitly orchestrates all transfers between on-chip and off-chip memory areas
Major implications on memory management Scratchpad allocation strategies Data partitioning strategies Dynamic relocation between scratchpad and DRAM to track the
program’s locality characteristics
Opportunities for compile-time and runtime optimization
Challenges in both Hardware and Software!
New Challenges and Opportunities
Qualcomm ResearchExcellence in Wireless
MAY | 2012 WWW.QUALCOMM.COM/RESEARCH
2323
State of the Art Capabilities Fostering Innovation
• Prototype Development Facilities
• CPU Simulation Clusters
• Antenna Ranges
• Outdoor Field Systems
• 30% of engineers with PhD, 50% Masters
• Systems, HW, SW, Standards, Test Engineering
• Ventures, Bus Dev, Technical Marketing, Program Mgmt.
Complete Development LabsHuman Resources
24
Global Research and Development Organization
UNITED STATES EUROPE ASIA
• San Diego, CA
• Santa Clara, CA
• Bridgewater, NJ
• Cambridge, UK
• Nuremberg, Germany
• Vienna, Austria
• Beijing, China
• Bangalore and Hyderabad, India
• Seoul, S. Korea
25
Qualcomm Research & University RelationsACADEMIC COLLABORATION TO FOSTER ADVANCED RESEARCH
RESEARCH
Ongoing relations with more than 30 US and 25 International Universities Current funding includes MIT, UC Berkeley, Stanford, UCSD, UT Austin, ASU,
UIUC, Univ. of Michigan, EPFL, IISc Bangalore, KAIST, Tsinghua
Research collaboration spans variety of technical areas Computer vision, multicore processing, context aware computing, machine
learning, low power devices,, wireless networks and signal processing, etc..
Qualcomm Innovation Fellowship (QInF) invests on innovative ideas Close interactions between Qualcomm Research engineers, graduate students and
professors
26
INNOVATEBEYOND WAN
Wireless Local Area
EXCELLING IN ALL FORMS OF WIRELESS
TAKE WWAN TO THE NEXT LEVEL
IMPROVING WWAN TECHNOLOGY
Processors & Devices
RE-ARCHITECTING NEXT-GEN MOBILE
DEVICES
BREAKTHROUGH PERFORMANCE
Application Enablers
TRANSFORMING THE MOBILE USER
EXPERIENCE
ENABLE SMART APPLICATIONS
Qualcomm Research For The Wireless Future
3G/4G
27
Innovate Beyond WANWIRELESS LOCAL AREA
PEANUT WIFI ADVANCED LTE D2D (FLASHLINQ) INNAV
• Next gen short range ultra-low power radio
• Multi Gbps WLAN using 5 GHz and 60 GHz band.
• Next Gen low-power WiFi for Internet of Things
• Proximal Wireless
• First Gen device-to-device wireless network
• Autonomous discovery
• Direct communications
• Indoor positioning for indoor location based applications
• Map tools for Mobile Devices
28
AUGMENTED REALITY LOOK LISTEN DASH AWARE
• Mobile user interface
• Computer vision for mobile devices
• Multiple language text detection and recognition
• With Mobile phone camera view finder
• Background Audio processing
• Augmented user experience
• Efficient video delivery over HTTP for mobile devices
• Build awareness in mobile devices
• For enhanced daily life situations
Enable Smart ApplicationsELEVATE THE WIRELESS USER EXPERIENCE
29
Breakthrough Device PerformanceRE-ARCHITECTING NEX-GEN DEVICES
ADVANCED RADIO TECHNOLOGIES MANTICORE GRYPHON
• New RF front-end and baseband technologies
• RF/antenna and systems/protocol techniques
• Concurrent multi-radio operation
• Advanced mobile device SW platforms
• Improved user experience
• Virtual machine design for SoC architecture
• Enabling higher power efficiency
Thank You