C++ Code Analysis: an Open Architecture for the Verification of Coding Rules
Paolo Tonella
ITC-irst, Centro per la Ricerca Scientifica e Tecnologica
ITC/CERN collaboration
The collaboration aims at improving the quality of the code developed at CERN, by means of:
Automatic check of coding rules. Recovery of the design from the code. Refactoring of the design.
All objectives share a common C++ code analysis functionality.
Outline of the talk
C++ analysis model Tool architecture Preprocessing Language issues Implementation of coding rules State of development
C++ analysis model
C++ analysis model
The model of the C++ language enjoys the following properties:
Generality. Extensibility. Abstraction.
Tool Architecture
Packages syntax and entities collaborate to generate a network of objects according to the C++ model.Package rules contains the coding conventions to be checked.
Tool architecture
The adoption of this architecture provides a remarkable flexibility.
All rules relying on properties of entities in the C++ model can be directly encoded.
The C++ model can be extended if additional properties need to be collected.
Adding a new application package is simple.
Preprocessing
C++ macros are expanded in the code by the preprocessor.
Macros do not necessarily comply with the C++ syntax.
#define BEGIN {#define END }
void f()BEGIN
int x = 0;...
END
Strip filter
The C++ preprocessor prepends all directly and indirectly included files. The strip filter removes those that are not user defined.
Moreover, the C++ preprocessor inserts some flags that are useful for the successive compilation step. Examples are: __extension__, __const__
The output of the strip filtering is a legal C++ module, that can be analyzed by the parser.
C++ language
C++ was conceived as an object oriented evolution of C. A strong requirement in its design was a total backward compatibility with C.
C++ had also a controversial evolution in its more advanced features, like exception handling and generic classes.
Is it a function declaration or a global object creation?
A x();
Language issues
To deal with the complexity of C++, it is important to distinguish between the compilation perspective and the analysis perspective.
The analyzer can assume that the input program is compilable with no errors.
The compiler needs to capture the statement level semantics.
The performances expected from the compiler are substantially superior.
All these considerations led to the choice of a javacc based C++ grammar
Compatibility with C
Structures and unions are reinterpreted as classes. Although methods are available from classes,
functions are still usable. Functions may operate on class objects, and
classes may invoke functions. Global variables violate encapsulation, but are
allowed. Types other than classes can be defined with the typedef.
Language issues (cont.) The language model contains C as a subset. Type equivalence affects the association
between declaration and definition.
Additional difficulties: Body of methods within class definition. Constructors, destructors, conversion functions
and operators. Encapsulation violation via friend construct. Generic classes (template). Exception throwing and catching.
Coding rules
Adding a new coding rule involves the following steps: A new class is defined which extends the general
class Rule. Its constructor passes the rule name and
description to the superclass constructor. A method check must be defined to implement the
interface of the superclass. The body of the method check can use the access
functions of the analysis package.
Coding rule exampleThe following coding rule is taken from the Naming
Rules enforced within the CERN experiment ALICE:
RN3 No special characters in names are allowed (_, #, &, @, -, %).
check() {
classes = Module.getClasses();
foreach (c in classes) {
if (c.getName().hasChar(_, #, &, @, -, %))
printViolationMessage(...);
methods = c.getMethods();
foreach (m in methods) {
if (m.getName().hasChar(_, #, &, @, -, %))
printViolationMessage(...);
locals = m.getLocals();
foreach (l in locals) ...
Adding new coding rules
The only constraint is that a formal description of the rule can be derived, for which a procedure can be written.
It may be necessary to augment the set of entities extracted by the CPPParser.
When entities are available, rule introduction is simple.
There is a clear and sharp separation between the responsibilities of packages rules and analysis.
Current limitations
Known limitations are related to the difficulties of covering the whole range of C++.
Genericity is not handled. Exception throwing and catching is not detected. Type equivalence is implemented only in a
simplified form.
Such limitations did not substantially limit the possibility of analyzing ALICE code, which does not exploits genericity and exceptions.
State of development
Conventions Tot. Impl. To be impl. Non impl. Excl.
Naming rules 21 17 0 3 1
Coding rules 14 8 2 2 2
Style rules 5 1 4 0 0
Namingguidelines
2 0 0 2 0
Codingguidelines
4 0 1 3 0
Styleguidelines
0 0 0 0 0
See: http://AliSoft.cern.ch/offline/codingconv.html
State of development (cont.)Coverage of the coding conventions for which an
automatic check is feasible:
Conventions Abs. Perc.
Total 46
Implementable 33 72%
Implemented 26 79%
Analyzed code
The RuleChecker tool was successfully executed with no errors on all the code in the current release of the ALICE experiment software.
Lines of code 85730
Subsystems 18
Classes 215
Modules (.cxx) 136
A violation report was generated for each module under analysis.
Conclusion
To make analysis independent from the applications using its outcomes:
a C++ language model was defined, a simple query protocol was used to access code
entities.
Executed on ALICE code, the tool RuleChecker: collected information about 85730 lines of code, reported no parse error, produced a violation report associated to each
input module.