1 .NET Compact Framework .NET Compact Framework 2.0 2.0 Optimizing For Optimizing For Performance Performance Roman Batoukov Roman Batoukov FUN403 FUN403 Development Lead Development Lead .NET Compact Framework .NET Compact Framework Microsoft Corporation Microsoft Corporation
49
Embed
1.NET Compact Framework 2.0 Optimizing For Performance Roman Batoukov FUN403 Development Lead.NET Compact Framework Microsoft Corporation.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
.NET Compact Framework .NET Compact Framework 2.02.0Optimizing For PerformanceOptimizing For Performance
.NET Compact Framework.NET Compact FrameworkHow we are different?How we are different?
Memory constraintsMemory constraintsStorage – Flash/ROMStorage – Flash/ROMPhysical MemoryPhysical MemoryVirtual Memory – 32MB per processVirtual Memory – 32MB per process
DesignDesign28% of the surface area in 8% of the size of 28% of the surface area in 8% of the size of full .NET Frameworkfull .NET FrameworkPortable JIT CompilerPortable JIT Compiler
Fast code generation, less optimizedFast code generation, less optimizedMay pitch JIT-compiled codeMay pitch JIT-compiled codeNo NGen, install time or persisted codeNo NGen, install time or persisted code
Interpreted virtual calls (no v-tables)Interpreted virtual calls (no v-tables)Sparse loading of metadataSparse loading of metadata
Micro-benchmarks versus ScenariosMicro-benchmarks versus ScenariosBenchmarking tipsBenchmarking tips
Use Environment.TickCount to measureUse Environment.TickCount to measureMeasure times greater than 1 secondMeasure times greater than 1 secondStart from known stateStart from known stateEnsure nothing else is runningEnsure nothing else is runningMeasure multiple times, take averageMeasure multiple times, take averageRun each test in own AppDomain/ProcessRun each test in own AppDomain/ProcessLog results at the endLog results at the endUnderstand JIT-time versus run-time costUnderstand JIT-time versus run-time cost
What does .stat tell you?What does .stat tell you?Working set and performance statisticsWorking set and performance statistics
More counters added in v2More counters added in v2Generics usageGenerics usageCOM interop usageCOM interop usageNumber of boxed valuetypesNumber of boxed valuetypesThreading and timersThreading and timersGUI objectsGUI objectsNetwork activity (socket bytes send/received)Network activity (socket bytes send/received)
Per Device CABPer Device CABInstall (SMS, etc)Install (SMS, etc)
System.System.ReflectionReflection
CasingCasing D3DMD3DM
9
Common Language Common Language RuntimeRuntimeExecution engineExecution engine
Call pathCall pathManaged calls are more expensive than nativeManaged calls are more expensive than native
Instance call: ~2-3X the cost of a native function callInstance call: ~2-3X the cost of a native function call
Virtual call: ~1.4X the cost of a managed instance callVirtual call: ~1.4X the cost of a managed instance call
Platform invoke: ~5X the cost of managed instance call Platform invoke: ~5X the cost of managed instance call (*Marshal int parameter)(*Marshal int parameter)
Properties are callsProperties are calls
JIT compilersJIT compilersAll platforms has the same optimizing JIT compiler All platforms has the same optimizing JIT compiler architecture in v2architecture in v2
OptimizationsOptimizationsMethod inlining for simple methodsMethod inlining for simple methods
Variable enregistration Variable enregistration
String interningString interning
10
Common Language Common Language RuntimeRuntimeCall path (sample)Call path (sample)public class Shape public class Shape
{{
protected int m_volume; protected int m_volume;
public public virtualvirtual int Volume int Volume
{ {
get {return m_volume;}get {return m_volume;}
}}
}}
public class Cube:Shape public class Cube:Shape
{{
public MyType(int vol) public MyType(int vol)
{ {
m_volume = vol;m_volume = vol;
}}
}}
public class Shape public class Shape
{{
protected int m_volume; protected int m_volume;
public int Volume public int Volume
{ {
get {return m_volume;}get {return m_volume;}
}}
}}
public class Cube:Shape public class Cube:Shape
{{
public MyType(int vol) public MyType(int vol)
{ {
m_volume = vol;m_volume = vol;
}}
}}
11
Common Language Common Language RuntimeRuntimeCall path (sample) Call path (sample) public class MyCollectionpublic class MyCollection
{{private const int m_capacity = 10000;private const int m_capacity = 10000;private Shape[] storage = new Shape[m_capacity];private Shape[] storage = new Shape[m_capacity];……public void Sort()public void Sort(){{
Common Language Common Language RuntimeRuntime Call path (sample)Call path (sample)
public class Shape public class Shape
{{
protected int m_volume; protected int m_volume;
public public virtualvirtual int Volume int Volume
{ {
get {return m_volume;}get {return m_volume;}
}}
}}
public class Cube:Shape public class Cube:Shape
{{
public MyType(int vol) public MyType(int vol)
{ {
m_volume = vol;m_volume = vol;
}}
}}
public class Shape public class Shape
{{
protected int m_volume; protected int m_volume;
public int Volume public int Volume
{ {
get {return m_volume;}get {return m_volume;}
}}
}}
public class Cube:Shape public class Cube:Shape
{{
public MyType(int vol) public MyType(int vol)
{ {
m_volume = vol;m_volume = vol;
}}
}}
No virtual call overheadNo virtual call overhead
Inlined (no call overhead Inlined (no call overhead at all)at all)
~ Equal to accessing field~ Equal to accessing field
57 sec57 sec 39 sec39 sec
13
Common Language Common Language RuntimeRuntime‘The Memory Bill’‘The Memory Bill’
Shared by all .NET applications runningShared by all .NET applications running.NET Compact Framework CLR DLLs.NET Compact Framework CLR DLLs.NET assemblies (memory mapped).NET assemblies (memory mapped)
Dynamic, per process memory costsDynamic, per process memory costsObjects allocatedObjects allocatedThreads stacksThreads stacksNumber of classes and methodsNumber of classes and methods
Runtime representation of metadataRuntime representation of metadataJIT compiled codeJIT compiled code
Unmanaged allocations (not under control of Unmanaged allocations (not under control of the CLR)the CLR)
Operating SystemOperating SystemNative DLLs called by application via P/InvokeNative DLLs called by application via P/Invoke
14
Common Language Common Language RuntimeRuntimeMemory heapsMemory heaps
Five memory heaps to reduce Five memory heaps to reduce fragmentationfragmentation
App-domainApp-domain CLR dynamic representation of CLR dynamic representation of metadata for the assembly loadermetadata for the assembly loader
Common Language Common Language RuntimeRuntime Garbage Collector (GC)Garbage Collector (GC)
Managed allocations are FASTManaged allocations are FAST7.5MB per sec (allocating 8 byte objects)7.5MB per sec (allocating 8 byte objects)
GC manages it’s own heapGC manages it’s own heapAllocates 64KB blocks, 1MB cacheAllocates 64KB blocks, 1MB cache
Use VirtualAlloc to enable release of virtual and Use VirtualAlloc to enable release of virtual and physical memory back to systemphysical memory back to system
Compacts heap when fragmentation occursCompacts heap when fragmentation occurs
18
Common Language Common Language RuntimeRuntimeGarbage CollectorGarbage Collector
What triggers a GC?What triggers a GC?Memory allocation failureMemory allocation failure1M of GC objects allocated (v2)1M of GC objects allocated (v2)Application going to backgroundApplication going to backgroundGC.Collect() (AGC.Collect() (Avoid “helping” the GC!)void “helping” the GC!)
In general, if you don’t allocate objects, GC won’t In general, if you don’t allocate objects, GC won’t occuroccur
Beware of side-effects of calls that may allocateBeware of side-effects of calls that may allocate objectsobjectsWhat happens at GC time?What happens at GC time?
Freezes all threads at Freezes all threads at safesafe point pointFinds all Finds all livelive objects and marks them objects and marks them
An object is An object is livelive if it is if it is reachablereachable from from root locationroot locationUnmarked objects are freed and added to finalizer queueUnmarked objects are freed and added to finalizer queue
Finalizers are run on a separate threadFinalizers are run on a separate threadGC pools are compacted if required GC pools are compacted if required Return free memory to the operating systemReturn free memory to the operating system
19
Common Language Common Language RuntimeRuntimeGarbage CollectorGarbage Collector
0
10
20
30
40
50
60
70
80
90
0 100000 300000 500000
Number of Live Objects
GC
late
nc
y (
ms
)
GC Latency per collectionGC Latency per collection
20
Common Language Common Language RuntimeRuntimeGarbage CollectorGarbage Collector
0
20000
40000
60000
80000
100000
120000
140000
160000
400 4000 20000 40000 80000
Object size (bytes)
Allo
cati
on
rate
ite
r/sec
Allocation rateAllocation rate
21
Unnecessary string allocationsUnnecessary string allocationsStrings are immutable Strings are immutable
String manipulations (Concat(), etc.) cause String manipulations (Concat(), etc.) cause copies copies Use StringBuilder Use StringBuilder http://weblogs.asp.net/ricom/archive/2003/12/02/40778.ahttp://weblogs.asp.net/ricom/archive/2003/12/02/40778.aspxspx
Common Language Common Language RuntimeRuntimeWhere garbage comes from?Where garbage comes from?
Common Language Common Language RuntimeRuntimeWhere garbage comes from?Where garbage comes from?
Unnecessary boxingUnnecessary boxingValue types allocated on the stack Value types allocated on the stack (fast to allocate)(fast to allocate)Boxing causes a heap allocation and a copyBoxing causes a heap allocation and a copyUse strongly typed arrays and collectionsUse strongly typed arrays and collections(Framework collections are (Framework collections are NOTNOT strongly typed) strongly typed) class Hashtable {class Hashtable {
public public ObjectObject this[ this[Object keyObject key] { get; set; }] { get; set; } }}
25
Common Language Common Language RuntimeRuntimeSample Code: Value Types and Sample Code: Value Types and boxing boxing
public struct AccountId {public struct AccountId { public int m_number;public int m_number; public override int GetHashCode() { return m_number; }public override int GetHashCode() { return m_number; }}}public struct AccountData {public struct AccountData {
private int m_balance;private int m_balance;
public int Balance {public int Balance {
get {return m_balance;}get {return m_balance;}
set {m_balance=value;} set {m_balance=value;}
}}
}}public class Accounts {public class Accounts {
public const int num = 10000;public const int num = 10000;
Object[] accounts = new Object[] accounts = new
Object[num];Object[num];
public Object this[Object id] {public Object this[Object id] {
get {return get {return accounts[id.GetHashCode()];}accounts[id.GetHashCode()];}
set {accounts[id.GetHashCode()] = set {accounts[id.GetHashCode()] = value;}value;}
}}
}}
public class Accounts {public class Accounts {
public const int num = 10000;public const int num = 10000;
AccountData[] accounts = newAccountData[] accounts = new
AccountData[num];AccountData[num];
public AccountData this[AccountId id] {public AccountData this[AccountId id] {
get {return get {return accounts[id.GetHashCode()];}accounts[id.GetHashCode()];}
set {accounts[id.GetHashCode()] = set {accounts[id.GetHashCode()] = value;}value;}
}}
}}
26
Common Language Common Language RuntimeRuntimeSample Code: Value Types and Sample Code: Value Types and boxing boxing
Accounts ac = new Accounts(); int i;Accounts ac = new Accounts(); int i;for (i = 0; i < Accounts.num_accounts; i++) {for (i = 0; i < Accounts.num_accounts; i++) {
for (i = 0; i < Accounts.num_accounts; i++) {for (i = 0; i < Accounts.num_accounts; i++) {AccountId id; id.m_number = i;AccountId id; id.m_number = i;AccountData rec = (AccountData)ac[ id ]; AccountData rec = (AccountData)ac[ id ]; rec.Balance-=10;rec.Balance-=10;ac[ id ]=rec;ac[ id ]=rec;
}}iterations += i;iterations += i;
} while (Environment.TickCount - start < 1000);} while (Environment.TickCount - start < 1000);
27
Common Language Common Language RuntimeRuntimeSample Code: Value Types and Sample Code: Value Types and boxing boxing
public class Accounts {public class Accounts {
public const int num = 10000;public const int num = 10000;
Object[] accounts = new Object[] accounts = new
Object[num];Object[num];
public Object this[Object id] {public Object this[Object id] {
get {return get {return accounts[id.GetHashCode()];}accounts[id.GetHashCode()];}
set {accounts[id.GetHashCode()] = set {accounts[id.GetHashCode()] = value;}value;}
}}
}}
public class Accounts {public class Accounts {
public const int num = 10000;public const int num = 10000;
public AccountData this[AccountId id] {public AccountData this[AccountId id] {
get {return get {return accounts[id.GetHashCode()];}accounts[id.GetHashCode()];}
set {accounts[id.GetHashCode()] = set {accounts[id.GetHashCode()] = value;}value;}
}}
}}
0.15M iter/sec0.15M iter/sec
Boxed value types 4138460Boxed value types 4138460Garbage Collections (GC) 4 Garbage Collections (GC) 4 Bytes Collected By GC Bytes Collected By GC 41384604138460GC Latency TimeGC Latency Time 132 ms 132 ms
2.5M iter/sec2.5M iter/sec
Boxed value types 2Boxed value types 2Garbage Collections (GC) 0 Garbage Collections (GC) 0 Bytes Collected By GC 0Bytes Collected By GC 0GC Latency TimeGC Latency Time 0 ms 0 ms
28
Common Language Common Language RuntimeRuntimeSample Code: GenericsSample Code: Genericspublic class Accounts<U, V>public class Accounts<U, V>{{
public const int num_accounts = 10000;public const int num_accounts = 10000;private U[] accounts = new U[num_accounts];private U[] accounts = new U[num_accounts];public U this[V id] {public U this[V id] {
Accounts<AccountData, AccountId> ac = new Accounts<AccountData, AccountId>();Accounts<AccountData, AccountId> ac = new Accounts<AccountData, AccountId>();int i;int i;for (i = 0; i < Accounts<AccountData, AccountId>.num_accounts; i++) {for (i = 0; i < Accounts<AccountData, AccountId>.num_accounts; i++) {
}} iterations += i;iterations += i;} while (Environment.TickCount - start < 1000);} while (Environment.TickCount - start < 1000);
29
Common Language Common Language RuntimeRuntimeSample Code: GenericsSample Code: Genericspublic class Accounts<U, V>public class Accounts<U, V>{{
public const int num_accounts = 10000;public const int num_accounts = 10000;private U[] accounts = new U[num_accounts];private U[] accounts = new U[num_accounts];public U this[V id] {public U this[V id] {
get {return get {return accounts[id.GetHashCode()];}accounts[id.GetHashCode()];}
set {accounts[id.GetHashCode()] = set {accounts[id.GetHashCode()] = value;}value;}
}}}}
Accounts<AccountData, AccountId> ac = new Accounts<AccountData, AccountId>();Accounts<AccountData, AccountId> ac = new Accounts<AccountData, AccountId>();int i;int i;for (i = 0; i < Accounts<AccountData, AccountId>.num_accounts; i++) {for (i = 0; i < Accounts<AccountData, AccountId>.num_accounts; i++) {
Boxed value types 2Boxed value types 2Garbage Collections (GC) 0 Garbage Collections (GC) 0 Bytes Collected By GC 0Bytes Collected By GC 0GC Latency TimeGC Latency Time 0 ms 0 ms
Closed Types Loaded 1 Closed Types Loaded 1 Closed Types per definition mean=1 Closed Types per definition mean=1 max=1max=1
31
Common Language Common Language RuntimeRuntimeGenericsGenericsStrong typing without code duplicationStrong typing without code duplication
Fully specialized implementation in .NET Fully specialized implementation in .NET Compact Framework v2Compact Framework v2
No unnecessary boxing and type castsNo unnecessary boxing and type casts
Specialized code is more efficient than sharedSpecialized code is more efficient than shared
ConsConsInternal execution engine data structures and JIT-Internal execution engine data structures and JIT-compiled code aren’t sharedcompiled code aren’t shared
Common Language Common Language RuntimeRuntimeFinalization and DisposeFinalization and Dispose
Cost of finalizersCost of finalizersNon-deterministic cleanupNon-deterministic cleanup
Extends lifetime of objectExtends lifetime of object
In general, rely on GC for automatic memory In general, rely on GC for automatic memory cleanupcleanup
The exceptions to the rule…The exceptions to the rule…If your object contains an unmanaged resource that the If your object contains an unmanaged resource that the GC is unaware of, you need to implement a finalizerGC is unaware of, you need to implement a finalizer
Also implement Dispose pattern to release unmanaged Also implement Dispose pattern to release unmanaged resource in deterministic mannerresource in deterministic manner
Dispose method should suppress finalization (FxCop rule)Dispose method should suppress finalization (FxCop rule)
If the object you are using implements Dispose, call it If the object you are using implements Dispose, call it when you are done with the objectwhen you are done with the object
Common Language Common Language RuntimeRuntimeExceptionsExceptions
Exceptions are cheap…until you throwExceptions are cheap…until you throw
Throw exceptions in exceptional Throw exceptions in exceptional circumstancescircumstances
Do not use exceptions for normal flow Do not use exceptions for normal flow controlcontrol
Use performance counters to track the Use performance counters to track the number of exceptions thrownnumber of exceptions thrown
Replace “On Error/Goto” with Replace “On Error/Goto” with “Try/Catch/Finally” in Microsoft Visual Basic “Try/Catch/Finally” in Microsoft Visual Basic .NET.NET
34
Common Language Common Language RuntimeRuntimeReflectionReflection
Reflection can be expensiveReflection can be expensive
Working set costWorking set costType and Member enumerations (for example: Type and Member enumerations (for example: Assembly.GetTypes(), Type.GetMethods())Assembly.GetTypes(), Type.GetMethods())
Runtime data structures Runtime data structures Think ~100 bytes per loaded type, ~80 bytes per loaded methodThink ~100 bytes per loaded type, ~80 bytes per loaded method
Be aware of APIs that use reflection as a side effectBe aware of APIs that use reflection as a side effect
GetHashCode() and Equals() (for value types)GetHashCode() and Equals() (for value types)
35
Common Language Common Language RuntimeRuntimeBuilding a Cost Model for Managed Building a Cost Model for Managed MathMathMath performanceMath performance
32 bit integers: Similar to native math32 bit integers: Similar to native math
64 bit integers: ~5-10X cost of native 64 bit integers: ~5-10X cost of native mathmath
Floating point: Similar to native mathFloating point: Similar to native mathARM processors do not have FPU ARM processors do not have FPU
Avoid unnecessary boxing and type Avoid unnecessary boxing and type casts – use generic collectionscasts – use generic collections
Full support for all generic collections in Full support for all generic collections in the .NET Compact Framework v2!the .NET Compact Framework v2!
38
Windows FormsWindows FormsBest PracticesBest Practices
Load and cache Forms in the backgroundLoad and cache Forms in the backgroundPopulate data separate from Form.Show()Populate data separate from Form.Show()
Pre-populate data, orPre-populate data, orLoad data async to Form.Show()Load data async to Form.Show()
Use BeginUpdate/EndUpdate when it is availableUse BeginUpdate/EndUpdate when it is availablee.g. ListView, TreeViewe.g. ListView, TreeView
Use SuspendLayout/ResumeLayout when Use SuspendLayout/ResumeLayout when repositioning controlsrepositioning controlsKeep event handling code tightKeep event handling code tight
Process bigger operations asynchronouslyProcess bigger operations asynchronouslyBlocking in event handlers will affect UI responsivenessBlocking in event handlers will affect UI responsiveness
Form load performanceForm load performanceReduce the number of method calls during initializationReduce the number of method calls during initialization
39
Graphics And GamesGraphics And GamesBest PracticesBest Practices
Compose to off-screen buffers to Compose to off-screen buffers to minimize direct to screen blittingminimize direct to screen blitting
Approximately 50% fasterApproximately 50% faster
Avoid transparent blitting in areas Avoid transparent blitting in areas that require performancethat require performance
Approximate 1/3 speed of normal blittingApproximate 1/3 speed of normal blitting
Consider using pre-rendered images Consider using pre-rendered images vs using System.Drawing rendering vs using System.Drawing rendering primitivesprimitives
Need to measure on a case-by-case Need to measure on a case-by-case basisbasis
40
XMLXMLBest Practices for Managing Large XML Data Best Practices for Managing Large XML Data FilesFilesUse XMLTextReader/XMLTextWriterUse XMLTextReader/XMLTextWriter
Smaller memory footprint than using XmlDocumentSmaller memory footprint than using XmlDocumentXmlTextReader is a pull model parser which only reads a XmlTextReader is a pull model parser which only reads a “window” of the data“window” of the dataXmlDocument builds a generic, untyped object model XmlDocument builds a generic, untyped object model using a treeusing a tree
Type stored as stringType stored as stringOK to use with smaller documents (64K XML: ~0.25s)OK to use with smaller documents (64K XML: ~0.25s)
Optimize the structure of XML documentOptimize the structure of XML document Use elements to group (allows use of Skip() in XmlReader)Use elements to group (allows use of Skip() in XmlReader)Use attributes to reduce size - processing attribute-centric Use attributes to reduce size - processing attribute-centric documents is fasterdocuments is faster Keep it short! (attribute and element names)Keep it short! (attribute and element names)Avoid gratuitous use of white spaceAvoid gratuitous use of white space
Use XmlReader/XmlWriter factory classes to create Use XmlReader/XmlWriter factory classes to create optimized reader or writeroptimized reader or writer
Applying proper XMLReaderSettings can improve Applying proper XMLReaderSettings can improve performanceperformance
TransportsTransports Active Active SyncSync HTTPHTTP SocketsSockets ReplicationReplication
Or RDAOr RDA
Remote systemRemote system
XmlDocumentXmlDocument
BusinessBusinesslogiclogic
DataDataAdaptersAdapters
WebWebservicesservices
DataDataReadersReadersBinary orBinary or
text filetext fileXMLXMLfilefile
SQLSQLServerServerMobile Mobile
SQLSQLDBDB
OtherOtherDataData
Sources Sources
MSMQMSMQ
OtherOtherDBDB
44
Web ServicesWeb ServicesWhere is a bottleneckWhere is a bottleneck
Are you network bound or CPU bound?Are you network bound or CPU bound?Use perf counters: socket bytes sent / received. Use perf counters: socket bytes sent / received. Do you come close to the network capacity? Do you come close to the network capacity?
If you are network bound - work on reducing the size If you are network bound - work on reducing the size of of the messagethe message
Create a “canned” message, send over HTTP. Create a “canned” message, send over HTTP. Compare performance with the web service.Compare performance with the web service.
If you are CPU bound, optimize the serialization If you are CPU bound, optimize the serialization scheme scheme for speedfor speed
Working set improvementsWorking set improvementsMore speedMore speed
46
SummarySummary
Make performance a requirement Make performance a requirement and measureand measure
Understand the APIsUnderstand the APIs
Avoid unnecessary object allocation Avoid unnecessary object allocation and copies due toand copies due to
String manipulationsString manipulations
BoxingBoxing
Not pre-sized collectionsNot pre-sized collections
Understand data access performance Understand data access performance bottlenecks bottlenecks
47
Community ResourcesCommunity Resources
At PDCAt PDCILL03 Intelligent Data Synchronization in a Semi-ILL03 Intelligent Data Synchronization in a Semi-Connected EnvironmentConnected EnvironmentILL04 Write Once, Display Anywhere: UI for Windows ILL04 Write Once, Display Anywhere: UI for Windows Mobile DevicesMobile DevicesTLN316 Windows Mobile: New Emulation Technology TLN316 Windows Mobile: New Emulation Technology for Building Mobile Applications with Visual Studio 2005for Building Mobile Applications with Visual Studio 2005
After PDCAfter PDCMSDN dev center: MSDN dev center: http://msdn.microsoft.com/mobility/http://msdn.microsoft.com/mobility/.NET Compact Framework Team Blog: .NET Compact Framework Team Blog: http://blogs.msdn.com/netcfteam/http://blogs.msdn.com/netcfteam/.NET Compact Framework Performance FAQ: .NET Compact Framework Performance FAQ: http://blogs.msdn.com/netcfteam/archive/2005/http://blogs.msdn.com/netcfteam/archive/2005/05/04/414820.aspx05/04/414820.aspx