Devnology Lecture Software Reengineering Martin Pinzger Software Engineering Research Group Delft University of Technology
May 17, 2015
Devnology LectureSoftware Reengineering
Martin PinzgerSoftware Engineering Research GroupDelft University of Technology
Greenfield software development
2
Non-greenfield software development
3
?
How often did you ...
4
... encounter greenfield and non-greenfield software engineering?
Because existing software, often called legacy software, is valuable
Often business-critical
A huge amount of money has already been invested in it
Has been tested and runs
Does (mainly) what it should do
Would you replace such a system?
Why non-greenfield engineering?
5
Why do we (often) start from a mess?
6
Lehman’s Laws of software evolution
Continuing changeA program that is used in a real-world environment must change, or become progressively less useful in that environment.
Increasing complexityAs a program evolves, it becomes more complex, and extra resources are needed to preserve and simplify its structure.
For more information read Lehman and Belady, 1985
7
Evolution of Mozilla source code
8
Lehman’s Laws in practice
Existing software Is often modified in an ad-hoc manner (quick fixes)
Lack of time, resources, money, etc.
Initial good design is not maintainedSpaghetti code, copy/paste programming, dependencies are introduced, no tests, etc.
Documentation is not updated (if there is one)Architecture and design documents
Original developers leave and with them their knowledge
9
Typical result of such practices
10
Implications of the results
Software maintenance costs continuously increase
Between 50% and 75% of global software development costs are spent on maintenance!
Up to 60% of a maintenance effort is spent on understanding the existing software
11
What is your decision?
12
* duplicated code* complex conditionals* abusive inheritance* large classes/methods
According to Lehman: “there will always be changes”
hack it?
* first reengineer* then implement changes
Take a loan on your softwarepay back via reengineering
Investment for the futurepaid back during maintenance
Lecture Outline
13
Introduction
Reengineering Process
Module Capture/Reverse Engineering
Problem Detection
Summary and Conclusions
Let’s reengineer
Definition:
“Reengineering is the examination and alteration of a subject system to reconstitute it in a new form and the subsequent implementation of the new form.”
[Demeyer, Ducasse, Nierstrasz]http://scg.unibe.ch/download/oorp/
14
Reengineering Life-Cycle
15
(1) requirementanalysis
(2) modelcapture
(3) problemdetection (4) problem
resolution
New Requirements
Designs
Code
Goals of reengineering
16
Testability
Understandability
Modifiability
Extensibility
Maintainability
…
Goals of reengineering (2)
UnbundlingSplit a monolithic system into parts that can be separately marketed
Performance“First do it, then do it right, then do it fast”
Design extractionTo improve maintainability, portability, etc.
Exploitation of New TechnologyI.e., new language features, standards, libraries, etc.
17
In this course, you will learn
Best practices to analyze and understand software systems (i.e., reverse engineering)
Heuristics and tools to detect shortcomings in the design and implementation of software systems
18
Setting DirectionFirst ContactInitial UnderstandingDetailed Model Capture
Model CaptureReverse Engineering
Setting direction patterns
20
Agree on Maxims
Set direction
Appoint aNavigator
Speak to theRound Table
Maintaindirection
Coordinatedirection
Most Valuable First
Where to start?
Fix Problems,Not Symptoms
If It Ain't BrokeDon't Fix It
What not to do?What to do?
Keep it Simple
How to do it?
Principles & guidelines forsoftware project management areespecially relevant for reengineering projects
Pattern: Most Valuable First
Problem: Which problems should you address first?
21
Most valuable first (2)
Solution: Work on aspects that are most valuable to your customer
Maximize commitment
Deliver results early
Build confidence
22
Most valuable first (3)
23
Most valuable first (4)
24
How do you tell what is valuable?Identify your customer
Understand the customer’s business model
Determine measurable goals
Consult change logs for high activity
Play the Planning Game
Fix Problems, not Symptoms
Most valuable first (5)
Planning Game
25
Setting DirectionFirst ContactInitial UnderstandingDetailed Model Capture
Model CaptureReverse Engineering
27
What is Reverse Engineering and why?
Reverse Engineering is the process of analyzing a subject systemto identify the system’s components and their interrelationships and
create representations of the system in another form or at a higher level of abstraction [Chikofsky & Cross, ’90]
MotivationUnderstanding other people’s code, the design and architecture in order to maintain and evolve a software system
First contact patterns
28
System experts
Chat with theMaintainers
Interviewduring Demo
Talk withdevelopers
Talk withend users
Talk about it
Verify whatyou hear
feasibility assessment(one week time)
Software System
Read All the Codein One Hour
Do a MockInstallation
Read it Compile it
Skim theDocumentation
Read about it
Pattern: Read all the code in one hour
Problem: Yes, but… the system is so big! Where to start?
29
Read all the code in one hour (2)
Solution: Read the code in one hour
Focus on:Functional tests and unit tests
Abstract classes and methods and classes high in the hierarchy
Surprisingly large structures
Comments
Check classes with high fan-out
Study the build process
30
In Java programs focus on
31
public abstract class Example {...}
public interface IExample {...}
public class Test { ... @Test public void testExample() { ... }}
/** * Block comment */public class Example { public void foo() { int x = 1; for (int x=1; i<100; i++) { // do something comment } }}
First project plan
Project scope (1/2 page)Description, context, goals, verification criteria
OpportunitiesIdentify factors to achieve project goals
Skilled maintainers, readable source-code, documentation, etc.
RisksIdentify risks that may cause problems
Absent test-suites, missing libraries, etc.
Record likelihood & impact for each risk
Go/no-go decision, activities (fish-eye view)
32
Setting DirectionFirst ContactInitial UnderstandingDetailed Model Capture
Model CaptureReverse Engineering
34
Initial understanding patterns
Top down
Speculate about Design
Analyze the Persistent Data
Study the Exceptional Entities
understand ⇒higher-level model
Bottom up
ITERATION
Recover design
Recover database
Identify problems
35
Study the exceptional entities
Problem: How can you quickly identify design problems?
Solution: Measure software entities and study the anomalous onesVisualize metrics to get an overview
Use simple metricsLines of code
Number of methods
...
Use simple metrics and layout algorithms.
(x,y) width
height colour
Visualize up to 5 metrics per node
Example: Exceptional entities
Use simple metrics and layout algorithms
36
Setting DirectionFirst ContactInitial UnderstandingDetailed Model Capture
Model CaptureReverse Engineering
38
Detailed model capture patterns
Expose the design & make sure it stays exposed
Tie Code and Questions
Refactor to Understand
Keep track ofyour understanding
Expose design
Step through the Execution
Expose collaborations
• Use Your Tools• Look for Key Methods• Look for Constructor Calls• Look for Template/Hook Methods• Look for Super Calls
Look for the Contracts
Expose contracts
Learn from the Past
Expose evolution
Write Teststo Understand
39
Refactor to understand
Problem: How do you decipher cryptic code?
Solution: Refactor it till it makes senseGoal (for now) is to understand, not to reengineer
HintsWork with a copy of the code
Refactoring requires an adequate test baseIf this is missing, “Write Tests to Understand”
40
Refactor to understand (cont.)
GuidelinesRename attributes to convey roles
Rename methods and classes to reveal intent
Remove duplicated code
Replace condition branches by methods
41
Learn from the past
Problem: How did the system get the way it is? Which parts are stable and which aren’t?
Solution: Compare versions to discover where code was removedRemoved functionality is a sign of design evolution
Use or develop appropriate tools
Look for signs of:Unstable design — repeated growth and refactoring
Mature design — growth, refactoring, and stability
42
Examples: Unstable design
Pulsar: Repeated Modifications make it grow and shrink. System Hotspot: Every System Version requires changes.
Reverse engineering tools/prototypes
X-Rayhttp://xray.inf.usi.ch/xray.php
DA4Javahttp://swerl.tudelft.nl/bin/view/MartinPinzger/MartinPinzgerDA4Java
Code Cityhttp://www.inf.usi.ch/phd/wettel/codecity.html
43
X-Ray
44
DA4Java
45
CodeCity
46
Summary Model Capture
47
Setting direction patterns toSet the goals
Find the Go/No-Go decision
Increase commitment of clients and developers
First contact patterns toObtain an overview and grasp the main issues
Assess the feasibility of the project
Initial Understanding & Detailed Model CapturePlan the work … and work the plan
Frequent and short iterations
In the Source CodeIn the Evolution
Problem Detection
49
Design problems
The most common design problems result from code that is
Unclear & complicated Duplicated (code clones)
50
Code Smells (if it stinks, change it)
Duplicated CodeLong MethodLarge ClassLong Parameter ListDivergent ChangeShotgun SurgeryFeature Envy...
A code smell is a hint that something has gone wrong somewhere in your code.
51
How to detect?
Measure and visualize quality aspects of the current implementation of a system
Source code metrics and structures
Measure and visualize quality aspects of the evolution of a system
Evolution metrics and structures
Use Polymetric Views
52
Polymetric Views
A combination of metrics and software visualization
Visualize software using colored rectangles for the entities and edges for the relationships
Render up to five metrics on one node:
Size (1+2)
Color (3)
Position (4+5)
7
Relationship
Entity
Y Coordinate
Height Color tone
Width
X Coordinate
53
Smell 1: Long Method
The longer a method is, the more difficult it is to understand it.
When is a method too long?Heuristic: > 10 LOCs (?)
How to detect?Visualize LOC metric values of methods
“Method Length Distribution View”
54
Method Length Distribution
Metrics:Boxes: MethodsWidth: LOCPosition-Y: LOCSort: LOC
55
Smell 2: Switch Statement
Problem is similar to code duplicationSwitch statement is scattered in different places
How to detect?Visualize McCabe Cyclomatic Complexity metric to detect complex methods
“Method Complexity Distribution View”
56
Method Complexity
Metrics:Boxes: MethodsPosition-X: LOCPosition-Y: MCCSort: -
More info on Detection Strategies
Object-Oriented Metrics in PracticeMichele Lanza and Radu Marinescu, Springer 2006http://www.springer.com/computer/swe/book/978-3-540-24429-5
57
Tool for Smell Detection
inCodehttp://www.intooitus.com/inCode.html
jDeodoranthttp://java.uom.gr/~jdeodorant/
58
In the Source CodeIn the Evolution
Problem Detection
60
Understanding Evolution
Changes can point to design problems“Evolutionary Smells”
ButOverwhelming complexity
How can we detect and understand changes?
SolutionsThe Evolution Matrix
The Kiviat Graphs
61
Visualizing Class Evolution
Visualize classes as rectangles using for width and height the following metrics:
NOM (number of methods)
NOA (number of attributes)
The Classes can be categorized according to their “personal evolution” and to their “system evolution”
-> Evolution Patterns
Foo
Bar
First Version
Major Leap
TIME (Versions)Growth Stabilisation
Added Classes
62
The Evolution Matrix
Last VersionRemoved Classes
63
Evolution Patterns & Smells
Day-fly (Dead Code)
Persistent
Pulsar (Change Prone Entity)
SupernovaWhite Dwarf (Dead Code)
Red Giant (Large/God Class)
Idle (Dead Code)
64
Persistent / Dayfly
Persistent: Has the same lifespan as the whole system. Part of the original design. Perhaps holy dead code which no one dares to remove.
Dayflies: Exists during only one or two versions. Perhaps an idea which was tried out and then dropped.
65
Pulsar / Supernova
Pulsar: Repeated Modifications make it grow and shrink. System Hotspot: Every System Version requires changes.
Supernova: Sudden increase in size. Possible Reasons:• Massive shift of functionality towards a class.• Data holder class for which it is easy to grow.• Sleeper: Developers knew exactly what to fill in.
66
White Dwarf / Red Giant / Idle
White Dwarf: Lost the functionality it had and now trundles along without real meaning. Possibly dead code -> Lazy Class.
Red Giant: A permanent god (large) class which is always very large.
Idle: Keeps size over several versions. Possibly dead code,possibly good code.
Summary Problem Detection
Design ProblemsResult from duplicated, unclear, complicated source code -> Code Smells
Detection heuristics and Polymetric Views to detect code and evolution smells
67
Conclusions
Object-Oriented Re-engineering PatternsSet of best practices to re-engineering software systems
Module Capture and Reverse EngineeringUnderstand the design and implementation of software systems
Problem DetectionHeuristics to detect Bad Smells in the source code and evolution of software systems
Next StepAdd tests and refactor detected problems
68
Reading material
69
Object-Oriented Reengineering PatternsSerge Demeyer, Stephane Ducasse, and Oscar Nierstraszfree copy from: http://scg.unibe.ch/download/oorp/
Working Effectively with Legacy Code Michael Feathers, Prentice Hall, 1 edition, 2004
Refactoring to PatternsJoshua Kerievsky, Addison-Wesley Professional, 2004
Additional reading
70
Agile Software Development: Principles Patterns, and PracticesRobert C. Martin, Prentice Hall
Object-Oriented Design Heuristics Arthur J. Riel, Prentice Hall, 1 edition, 1996
Refactoring: Improving the Design of Existing CodeMartin Fowler, Addison-Wesley Professional, 1999