On Simulation Jakob Engblom, PhD Virtutech & Uppsala University [email protected]ESSES, 4 Sept 2003 5 Simulation: Modeling + Execution • Build a model of the system • Try various scenarios on this model – Experimental, not analytical approach • Understand the real system by working with the model – More available – More inspectable – Less dangerous ESSES, 4 Sept 2003 6 Simulation or Analysis • Simulation gets closer to real world – More details – Fewer assumptions – High computational workload • Analytical models – Efficient predictors – Low computational workload – ... but more removed from world=less accurate ESSES, 4 Sept 2003 7 Sufficient Level of Detail • Maintain sufficient details – To observe relevant aspects of reality – To avoid artifacts of experiment • Abstract away unimportant aspects – Newtonian vs. quantum physics – Timing vs. function • Danger: bad abstractions = bad simulation
17
Embed
Simulation: Modeling + Execution On Simulationuser.it.uu.se/~jakob/presentations/esses03-simulation.pdf · On Simulation Jakob Engblom, PhD Virtutech & Uppsala University [email protected]
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• We need to decide the level of abstraction• More detail = smaller scope• Less detail = larger scope
– Size of systems that can be investigated– Number of different systems
• Measure of scope: speed– As number of software instructions per second
ESSES, 4 Sept 200313
What do we need to simulate?
Simulating Computer Systems
Program
Stimuli
Peripherals
Processor
ESSES, 4 Sept 200314
Detailed Hardware Models
• Transistor-level model– Very close to actual
implementation• Small scope
– Small piece of HW– Small programs– Stimuli at bit level– Speed: 100s of instructions per second– With 25MUSD hardware: 10-100 KIPS
• Necessary for hardware development
ESSES, 4 Sept 200315
Instruction-Set Simulation
• Model computer at instruction set level– Stable & defined interface– The level where hardware & software meet– Stimuli at transaction level
• Abstractions to increase scope:– Keep functionality correct– Vary fidelity in timing– Simplify some behavior
• Speed: 10 KIPS to 100 MIPS700 MIPS
Key issue: there can be no software
visible difference (including to the OS)
ESSES, 4 Sept 200316
Sufficient Detail of Model
• Complete from a software perspective– All readable values represented– All registers of CPU implemented– Software=OS, drivers, applications, middleware, ...
• Hardware considered as a set of devices– I/O-space or memory mapped– Behavior at level seen by device drivers
• No “abstract” networks, all concrete
• Next slide: example of detail required
ESSES, 4 Sept 200317
ESSES, 4 Sept 200318
Instruction-Set Simulation
• To run real workloads, you need– Hardware: CPU & devices– OS and other services– Stimuli to feed them
• Common methods to achieve this– Full-system simulation– Virtualization– User-level simulation
ESSES, 4 Sept 200319
Full-System Simulation
One physical computer
Virtutech Simics
Virtual computer systems of many different types
ESSES, 4 Sept 200320
Full-System Simulation
One physical computer
Virtualization system
Several virtual computers of the same type
VirtualizationNotNot
ESSES, 4 Sept 200321
Full-System Simulation
Hardware
CPU
Operating system
User program
RAM
Network
Disk
Middleware DB ServersReal OS & Software
GPU
Simulated hardware
Device controller
ESSES, 4 Sept 200322
Full-System SimulationUser-level simulation
NotNot
Real user program
Hardware
CPU
Operating system
User program
RAM
Middleware DB Servers
Simulated OS,
services, some HW
ESSES, 4 Sept 200323
Speed
• Depends of level of timing detail in model• Slowest: cycle-accurate simulation
– Hardware timing modeled in great detail• Fastest: emulation (user-level only)• Sweet spot: somewhere inbetween
– Simics tries to hit this spot– Configurable level of detail
ESSES, 4 Sept 200324
Speed
speed
accuracy
emulator(5x)
10 KIPS 1000 MIPS
cycle-accuratesimulator
(>10,000x)
Detailed hardware sim
Virtualization
fast full-system simulation(20-400x)
ESSES, 4 Sept 200325
Going up in Scope
• Interesting systems are larger than single CPU• Multiprocessors
– Homogeneous like servers– Heterogeneous like mobile phones
• Distributed systems– Local-Area Networks– Embedded CAN buses– Networks-on-chips
• = Simulated shared memory, networks
ESSES, 4 Sept 200326
Distributed Network Simulation
• Level of simulation– Entire packets, not physical layer– Simulate the network cards in nodes
• Spread simulation across multiple machines– Necessary increase of speed
• Still, maintain determinism– Synchronize simulated machines– One machine stops, all machines stop– Global checkpointing & restore
ESSES, 4 Sept 200327
Network Simulation
Real network of physical machines
Simulated network of simulated machines
Interface to real network
if needed
Simulation Advantages
ESSES, 4 Sept 200329
Simulation Advantages
• Configurability– Simulate anything,
• Independent of available hardware• Target architecture• System configuration
• Availability– Easy to copy setup, no manufacturing involved
• Determinism– Removes real-world indeterminism– Synchronization across machines and networks
ESSES, 4 Sept 200330
Simulation Advantages
• Checkpoint & restart– Save state of machine to reload later– Parallelize & repeat runs– Distribute fixed starting points
• Non-intrusive inspection & tracing– Any events or state in the machine– Does not affect running system– IO events, hardware events
• Deep inspection of system state– Caches, TLBs, registers, device registers, buffers ...
ESSES, 4 Sept 200331
Simulation Advantages
• Sandboxing– Completely walled-in – No hidden communications– Undo state changes– Dangerous experiments possible– Viruses, worms, buffer overflows, …
• Magic instructions:– Allows programs to communicate
with outside
HW/SW Cosimulation
Program
Peripherals
ESSES, 4 Sept 200333
Integrated Systems
• Highly-integrated devices on the rise
• Develop HW & SW in parallel
• Simulate hardware and software together in development of entire system
DSP
LCD driver
CPU
Blu
etoo
th
GSM Radio
Code memory
Data mem
ESSES, 4 Sept 200334
Big Systems & Small Details
• To achieve speed: reduce level of detail• To capture important effects: increase level
• Solution: model only parts at great detail– Finished hardware can be modeled simply– Model only what needs to be observed– Mostly, no need for RTL-level understanding
ESSES, 4 Sept 200335
Transactions vs Pins
• Transaction-level modeling:– Model transactions as a unit– Level of model:
• ”Memory read” / ”Network packet send” / ... – Only when something is activated
• Pin-level modeling:– Model detailed electronics of a transaction– Level of model:
• Individual pins• Clocked pulsing of transmission pins
– Every clock cycle
0101011001011
ESSES, 4 Sept 200336
Clocking vs Blocking
• Traditional hardware modeling:– One (or two) step per clock cycle– Clock to generate evolution of internal state– =All devices called each cycle
• Large overhead for context switching
• Optimized hardware modeling (blocking):– Only call when events (read, writes) occur– Evolve internal state several cycles at a time
• Count the time since last activation– Lower context switch overhead
• Pins:– Set address pins to 00011110– Drive clock pin to 1 and then 0– ... until data ready pin is 1– Then read 01000010 from data pins
CPU Device
CPU Device
ESSES, 4 Sept 200338
HW/SW Cosimulation: fast
Interface:transactions,
events, maybe clock cycles
Simics
(RT)OSApplication
Drivers
Memory
CPU Core
Devices
Behavioral model
ESSES, 4 Sept 200339
VHDL/Verilog Simulator
RTL-level simulation
HW/SW Cosimulation: detailed
Simics
(RT)OSApplication
Drivers
Memory
CPU Core
Devices
Interface: pins,
clock cycles
ESSES, 4 Sept 200340
Device Modeling
• Large part of work for a platform– Processors: few and standardized– Devices: (very) many and varied. But simpler. – Still pretty fast, at transaction level
• Modeling devices:– C/C++/Python with simulator APIs– SystemC– VHDL/Verilog– Graphical languages (Magic-C)
Stimulating a Simulation
Stimuli
ESSES, 4 Sept 200342
Stimuli
• Without proper stimuli, model is useless• Feed mechanism
– How to get information into the simulation• Data generation
– What to supply to the simulation – Can get tricky
ESSES, 4 Sept 200343
Regular Computers
• Fixed inputs– Spec benchmarks: loaded from disk
• Network– Load generation on simulated machines– Interface to a real network
• Interactive use• Load generators on real machines
• Keyboard & mouse– Map directly to real device– Easy for PC-on-PC-style– Interactive user
ESSES, 4 Sept 200344
Non-traditional Computers
• Phones, navigation computers, PDAs, etc.
• Application development– Use GUI to provide interactive
sessions with user– Keyboard, joystick, touch screen– Not radio data etc.
ESSES, 4 Sept 200345
Physical World Interaction
• Special simulated devices– Sensors & actuators
• Data sources– Statistical models of real
system behavior– Simulation models of
physical reality– Hardware-in-the-loop
simulation
ESSES, 4 Sept 200346
Configuration as Stimuli
• Stimuli = hardware configuration• Booting an operating system
– Test of OS software vs hardware– Reconfigure hardware, alter devices
• High-level software development– Powerful debugger, with checkpointing
ESSES, 4 Sept 200350
Hardware Replacement
• Embedded HW– Cheaper, more convenient, available, stable– Often 10000+ USD development platforms
• Virtual platform for early software dev– Boards under development– AMD64 (Hammer, Opteron, Athlon 64)
• Saved months for the Linux/AMD64 ports
– Next-gen UltraSparcs, ...
ESSES, 4 Sept 200351
Hardware Development
• Model hardware in development• Test components before physical prototype• Stimulate HW with real workloads
– Requires ability to run operating systems• HW/SW cosimulation
– At various levels of detail• Shortens time to market dramatically
ESSES, 4 Sept 200352
Parallelization of Development
Board design Board prototype
Software development
Handoff to the software team, when ”working” hardware exists
Board design & build simulator
Board prototype
Software developmentHandoff to the software team, using
a simulation of the hardware platform
Simulator = reference
ESSES, 4 Sept 200353
Network Software
• Develop network stacks & protocols– Easy to instrument the network, trace traffic– Easy to inject packets – No interference from other traffic– Synchronous breaks at important events
• Try network configurations– Large networks– Pathological topologies
ESSES, 4 Sept 200354
Performance Tuning
• Performance tuning of software– Trace & statistics on performance events– Cache misses, TLB misses, disk accesses– Memory access patterns– Get first-order estimates from event counts
• Absolute performance measurements– Requires very detailed models– Not a design goal of large-scale simulators
ESSES, 4 Sept 200355
Faults and Boundary Cases
• Fault injection– Repeatable, no physical damage necessary– Fault tolerant systems, safety critical systems– Examples: next slide
• Boundary case testing– Extremely small or large configurations– Intense bursts of interrupts– Communications latencies and intensity
ESSES, 4 Sept 200356
RAM
CPU
Network
Sensor
Device
BridgeDevice
Fault Injection Examples
Transient errors in xmit
Corrupt register values
Permanent
bit errors
Corrupt measurements
Kill entire
subsystem
Corrupt network
packets; unplugUnplug a
device
ESSES, 4 Sept 200357
Restore checkpoint
Check results
Run 2
Fault Injection & Checkpointing
Boot system SW
Position workload
Check results
Take checkpoint CKP
injectedfault
Restore checkpoint
Check results
Run 1
Did the fault
affect the result?
ESSES, 4 Sept 200358
Teaching
• Enable hands-on experience• Computer architecture• Embedded systems programming• Operating systems
– Debug half-finished systems– Same setup for all students, easy handins
• System management– Easy to restore system state– No risk to real machines and networks
Simulate with Care
ESSES, 4 Sept 200360
Obtaining Significant Results
• Computer architecture research– 90% or more done in simulation– Measure of success:
effect of modification to a reference machine– What is a significant result? -5%?+10%?
• How real is the machine modified?– SimpleScalar is not a real processor
• = need for quite extensive modeling
ESSES, 4 Sept 200361
Wisconsin Experiments
• Mark Hill et al, IEEE Computer Feb 2003• Investigating potential pitfalls of simulation• Detailed microarchitectural modeling
– Pipeline, caches, reordering, the works– Randomized L2 miss time (80-89 cycles)
• Several runs with same workload• Variable results!