Top Banner
Writing faster managed Writing faster managed code code Claudio Caldato Claudio Caldato Program Manager Program Manager CLR Performance Team CLR Performance Team
22

Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

Jan 04, 2016

Download

Documents

Moris Fletcher
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

Writing faster managed Writing faster managed codecode

Claudio CaldatoClaudio Caldato

Program ManagerProgram Manager

CLR Performance TeamCLR Performance Team

Page 2: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

OutlineOutline

•Performance engineeringPerformance engineering

•Managed code performanceManaged code performance– Contrast with native codeContrast with native code– Garbage collectionGarbage collection– Features’ cost and PitfallsFeatures’ cost and Pitfalls

•Perf problems diagnosisPerf problems diagnosis

Page 3: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

Performance engineeringPerformance engineering• Set goalsSet goals• Measure, measure and then measure Measure, measure and then measure • Know your platformKnow your platform• Process:Process:

– Budget, plan, verifyBudget, plan, verify– continuous improvementcontinuous improvement

•MeasureMeasure, track, refine, track, refine•automated testsautomated tests

• Build a performance cultureBuild a performance culture– User expectationsUser expectations– Developer attitudes – perf is Developer attitudes – perf is my featuremy feature!!

Page 4: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

Moving to managed codeMoving to managed code• Why? Productivity and qualityWhy? Productivity and quality

– Do more with less codeDo more with less code– Fewer bugsFewer bugs– Clean, modern libraries targeting Clean, modern libraries targeting

modern requirementsmodern requirements– Better software, soonerBetter software, sooner

• But is performance a problem?But is performance a problem?– Easier to write programs fastEasier to write programs fast– Easier to write fast programs?Easier to write fast programs?

Page 5: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

Know Your Garbage CollectorKnow Your Garbage Collector• Why?Why?

• GC basicsGC basics– Pause threads; trace reachable objects from Pause threads; trace reachable objects from

roots; compact live objects; recycle dead memoryroots; compact live objects; recycle dead memory

• Self tuningSelf tuning

• CLR is a generational mark and sweep GCCLR is a generational mark and sweep GC

Page 6: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

Generation 1 Generation 0

New objects allocated in generation 0New objects allocated in generation 0

GC: accessible references keep objects aliveGC: accessible references keep objects alive GC: preserves / compacts referenced objectsGC: preserves / compacts referenced objects

GC: objects left merged into older generation GC: objects left merged into older generation

Once again, new objects allocated in generation 0Once again, new objects allocated in generation 0

Garbage Collection in ActionGarbage Collection in ActionGarbage Collection in ActionGarbage Collection in Action

Page 7: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

Know Your Garbage CollectorKnow Your Garbage Collector• GC basicsGC basics

– Pause threads; trace reachable objects from Pause threads; trace reachable objects from roots; compact live objects; recycle dead roots; compact live objects; recycle dead memorymemory

• Self tuningSelf tuning• Generational GC Heaps Generational GC Heaps

– Gen0 – new objects – cache conscious; fast Gen0 – new objects – cache conscious; fast GCGC

– Gen1 – objects survived a GC of gen0Gen1 – objects survived a GC of gen0– Gen2 – long lived objects – survived a GC of Gen2 – long lived objects – survived a GC of

gen1,2gen1,2– Large object heapLarge object heap

• Server GCServer GC– Optimized for throughput and multi Optimized for throughput and multi

processor scalabilityprocessor scalability

Page 8: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

Garbage Collection PitfallsGarbage Collection Pitfalls• Object lifetimes still matter!Object lifetimes still matter!• Use an efficient “allocation profile”Use an efficient “allocation profile”

– Short lived objects are Short lived objects are cheapcheap (but not (but not freefree))– Don’t have a “midlife crisis” (avoid gen2 churn)Don’t have a “midlife crisis” (avoid gen2 churn)– Review with perfmon counters, CLRProfilerReview with perfmon counters, CLRProfiler

• Common PitfallsCommon Pitfalls– Keeping refs to “dead” object graphsKeeping refs to “dead” object graphs

• Null out object references (where appropriate)Null out object references (where appropriate)– Implicit boxingImplicit boxing– Pinning young objectsPinning young objects– GC.Collect considered harmfulGC.Collect considered harmful– Finalization ...Finalization ...

Page 9: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

Garbage Collection Pitfalls (2)Garbage Collection Pitfalls (2)Finalization and the Dispose PatternFinalization and the Dispose Pattern

•~C()~C(): non-deterministic clean up.: non-deterministic clean up.– object unref’dobject unref’d– promote to the next generationpromote to the next generation– queue finalizerqueue finalizer– Costs: retains object graph, finalizer threadCosts: retains object graph, finalizer thread

• use Dispose Patternuse Dispose Pattern– Implement IDisposableImplement IDisposable– Call GC.SuppressFinalizeCall GC.SuppressFinalize– Hold few obj fieldsHold few obj fields– Dispose early, Dispose early, when possible use ‘using’ (C#)when possible use ‘using’ (C#)

Page 10: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

Pitfall: Indiscriminate Code ReusePitfall: Indiscriminate Code Reuse• Your choices determine your perfYour choices determine your perf

– Your architecture, algorithms, ...Your architecture, algorithms, ...– Your uses of .NET FX types and methodsYour uses of .NET FX types and methods

• No specific advice holds everywhere, No specific advice holds everywhere, so you have to do your homeworkso you have to do your homework– Measure, inspectMeasure, inspect the time and space costs the time and space costs

of your platform(s), in your settingof your platform(s), in your setting

Page 11: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

• Data localityData locality– Remote, disk, RAM, cacheRemote, disk, RAM, cache– GC: objects allocated together in time, GC: objects allocated together in time,

stay together in spacestay together in space

• Data representationData representation– Complex data structures with a lot of Complex data structures with a lot of

pointers is GC costpointers is GC cost

Data CostData Cost

Page 12: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

Reflection CostReflection Cost• Fast and Light APIs:Fast and Light APIs:

– TypeOf, object.GetType, get_Module, TypeOf, object.GetType, get_Module, get_MemberType, new Token/Handle resolution APIsget_MemberType, new Token/Handle resolution APIs

• Costly APIs:Costly APIs:– MemberInfo, MethodInfo, FieldInfo, MemberInfo, MethodInfo, FieldInfo,

GetCustomAttribute, InvokeMember, Invoke, GetCustomAttribute, InvokeMember, Invoke, get_Nameget_Name

• Only request what you needOnly request what you need– minimize the use of GetMemberminimize the use of GetMemberss, GetConstructor, GetConstructorss, …, …

• Consider using the new Token/Handle resolution Consider using the new Token/Handle resolution APIsAPIs

Page 13: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

• Cache members after having retrieved Cache members after having retrieved themthem– For instance cache Member’s handleFor instance cache Member’s handle

• Avoid using Type.InvokeMemberAvoid using Type.InvokeMember• Avoid doing case insensitive member Avoid doing case insensitive member

lookupslookups• Use BindingsFlags.ExactMatch whenever Use BindingsFlags.ExactMatch whenever

possiblepossible• Use FxCopUse FxCop• InsidiousInsidious

– .NET FX code that uses reflection.NET FX code that uses reflection– Late bound code in VB.NET, JScript.NETLate bound code in VB.NET, JScript.NET

• Enforce early bindingEnforce early binding– Option Explicit OnOption Explicit OnOption Strict OnOption Strict On

Reflection Cost (2)Reflection Cost (2)

Page 14: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

P/Invoke, COM Interop CostP/Invoke, COM Interop Cost

• Efficient, but frequent calls add upEfficient, but frequent calls add up

• Costs also depend on marshalingCosts also depend on marshaling– Primitive types and arrays of same are cheapPrimitive types and arrays of same are cheap– Unicode to ANSI string conversions are not.Unicode to ANSI string conversions are not.

• Diagnosis Diagnosis – Perfmon: .NET CLR Interop counters (# of Perfmon: .NET CLR Interop counters (# of

marshalling)marshalling)– Time based profilingTime based profiling

• Mitigate interop call costs by batching Mitigate interop call costs by batching calls or move the boundarycalls or move the boundary

Page 15: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

Deployment considerationsDeployment considerations• AssembliesAssemblies

– Performance-wise: the fewer, the Performance-wise: the fewer, the better!better!

• Use GACUse GAC– Avoids repetitive SN signature Avoids repetitive SN signature

verificationverification• Use NGENUse NGEN

– Caches pre-JIT’d DLL; code may run Caches pre-JIT’d DLL; code may run slowerslower

– Generally Generally reduces startup time, reduces startup time, improves code shareabilityimproves code shareability

– Try it and measure for yourselfTry it and measure for yourself

Page 16: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

Xml: It is not always the Xml: It is not always the answeranswer• System.Xml.dll is 2MBSystem.Xml.dll is 2MB

– Load it only when you need itLoad it only when you need it

• Don’t use XML classes for trivial Don’t use XML classes for trivial taskstasks– MyApp.Config.Xml:MyApp.Config.Xml:

<MyApp><MyApp>

<MainWindow><MainWindow>

<Top>512</Top><Top>512</Top>

<Left>340</Left><Left>340</Left>

</MainWindow></MainWindow>

</MyApp></MyApp>

Page 17: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

Analyzing Performance ProblemsAnalyzing Performance ProblemsCode InspectionCode Inspection

• Ildasm – findstr “box” Ildasm – findstr “box”

•Debuggers – Module loads, rebasingDebuggers – Module loads, rebasing

•FxCopFxCop – – Static Analyzer Static Analyzer

Page 18: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

Analyzing Performance ProblemsAnalyzing Performance ProblemsMeasure It, With ToolsMeasure It, With Tools

•High level diagnosticsHigh level diagnostics– Taskmgr, perfmon, vadump, event Taskmgr, perfmon, vadump, event

tracing for Windows (ETW)tracing for Windows (ETW)

•SpaceSpace– CLR Profiler, code profilers, ETWCLR Profiler, code profilers, ETW

•TimeTime– Code profilers, timing loops, ETWCode profilers, timing loops, ETW

Page 19: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

Improve startup timeImprove startup time• Cold startup is Cold startup is typicallytypically dominated by disk dominated by disk

accesses and warm startup by CPU usage.accesses and warm startup by CPU usage.

• Reduce dlls loaded at startup if possible.Reduce dlls loaded at startup if possible.

• Ngen your assemblies. Jitting consumes CPU Ngen your assemblies. Jitting consumes CPU at startup.at startup.

• Place strong named assemblies in the GAC. Place strong named assemblies in the GAC.

• The application would be doing its own The application would be doing its own computations. Is the bottleneck here? computations. Is the bottleneck here?

Page 20: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

ResourcesResources• Patterns & Practices: Improving .NET Application Patterns & Practices: Improving .NET Application

Performance and ScalabilityPerformance and Scalability– [http://msdn.microsoft.com/perf][http://msdn.microsoft.com/perf]

• .NET framework developer center.NET framework developer center– Programming Information/performanceProgramming Information/performance

• Usergroup:Usergroup:– microsoft.public.dotnet.framework.performancemicrosoft.public.dotnet.framework.performance

• Blogs Blogs – RicoM, MaoniS RicoM, MaoniS

• Claudio’s Quick listClaudio’s Quick list

Page 21: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

Q&AQ&A

Page 22: Writing faster managed code Claudio Caldato Program Manager CLR Performance Team.

QuestionsQuestions• In the product cycle, when do you start In the product cycle, when do you start

working on performance?working on performance?

• What are the top issues you have to deal What are the top issues you have to deal with?with?

• Are there good tools to do performance Are there good tools to do performance analysis?, what is missing?analysis?, what is missing?

• Where type of resources do you use to find Where type of resources do you use to find answers to performance issues?answers to performance issues?

• Comment the following statement: “In Comment the following statement: “In managed code it is easier to find and solve managed code it is easier to find and solve performance issues”performance issues”