The use of the code analysis library OpenC++: modifications, improvements, error corrections Author: Andrey Karpov Date: 12.01.2008 Abstract The article may be interesting for developers who use or plan to use OpenC++ library (OpenCxx). The author tells about his experience of improving OpenC++ library and modifying the library for solving special tasks. Introduction One may often here in forums that there are a lot of C++ syntax analyzers ("parsers"), and many of them are free. Or that one may take YACC, for example, and realize his own analyzer easily. Don't believe, it is not so easy [1 , 2]. One may understand it especially if one remembers that it is even not half a task to parse syntax. It is necessary to realize structures for storing the program tree and semantic tables containing information about different objects and their scopes. It is especially important while developing specialized applications related to the processing and static analysis of C++ code. It is necessary for their realization to save the whole program tree what may be provided by few libraries. One of them is open library OpenC++ (OpenCxx) [3 ] about which we'll speak in this article. We'd like to help developers in mastering OpenC++ library and share our experience of modernization and improvement of some defects. The article is a compilation of pieces of advice, each of which is devoted to correction of some defect or realization of improvement. The article is based on recollections about changes that were carried out in VivaCore library [4 ] based on OpenC++. Of course, only a small part of these changes is discussed here. It is a difficult task to remember and describe them all. And, for example, description of addition of C language support into OpenC++ library will take too much place. But you can always refer to original texts of VivaCore library and get a lot of interesting information. It remains to say that OpenC++ library is unfortunately out-of-date now and needs serious improvement for supporting the modern C++ language standard. That's why if you are going to realize a modern compiler for example, you'd better pay your attention to GCC or to commercial libraries [5 , 6 ]. But OpenC++ still remains a good and convenient tool for many developers in the sphere of systems of specialized processing and modification of program code. With the use of OpenC++ many interesting solutions are developed, for example, execution environment OpenTS [7 ] for T++ programming language (development of Program systems Institution RAS), static code analyzer Viva64 [8 ] or Synopsis tool for preparing documentation on the original code [9 ]. The purpose of the article is to show by examples how one can modify and improve OpenC++ library code. The article describes 15 library modifications related to error correction or addition of new
38
Embed
The use of the code analysis library OpenC++: modifications, improvements, error corrections
The article may be interesting for developers who use or plan to use OpenC++ library (OpenCxx). The author tells about his experience of improving OpenC++ library and modifying the library for solving special tasks.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The use of the code analysis library
OpenC++: modifications, improvements,
error corrections
Author: Andrey Karpov
Date: 12.01.2008
Abstract The article may be interesting for developers who use or plan to use OpenC++ library (OpenCxx). The
author tells about his experience of improving OpenC++ library and modifying the library for solving
special tasks.
Introduction One may often here in forums that there are a lot of C++ syntax analyzers ("parsers"), and many of
them are free. Or that one may take YACC, for example, and realize his own analyzer easily. Don't
believe, it is not so easy [1, 2]. One may understand it especially if one remembers that it is even not
half a task to parse syntax. It is necessary to realize structures for storing the program tree and semantic
tables containing information about different objects and their scopes. It is especially important while
developing specialized applications related to the processing and static analysis of C++ code. It is
necessary for their realization to save the whole program tree what may be provided by few libraries.
One of them is open library OpenC++ (OpenCxx) [3] about which we'll speak in this article.
We'd like to help developers in mastering OpenC++ library and share our experience of modernization
and improvement of some defects. The article is a compilation of pieces of advice, each of which is
devoted to correction of some defect or realization of improvement.
The article is based on recollections about changes that were carried out in VivaCore library [4] based on
OpenC++. Of course, only a small part of these changes is discussed here. It is a difficult task to
remember and describe them all. And, for example, description of addition of C language support into
OpenC++ library will take too much place. But you can always refer to original texts of VivaCore library
and get a lot of interesting information.
It remains to say that OpenC++ library is unfortunately out-of-date now and needs serious improvement
for supporting the modern C++ language standard. That's why if you are going to realize a modern
compiler for example, you'd better pay your attention to GCC or to commercial libraries [5, 6]. But
OpenC++ still remains a good and convenient tool for many developers in the sphere of systems of
specialized processing and modification of program code. With the use of OpenC++ many interesting
solutions are developed, for example, execution environment OpenTS [7] for T++ programming
language (development of Program systems Institution RAS), static code analyzer Viva64 [8] or Synopsis
tool for preparing documentation on the original code [9].
The purpose of the article is to show by examples how one can modify and improve OpenC++ library
code. The article describes 15 library modifications related to error correction or addition of new
functionality. Each of them not only allows to make OpenC++ library better but also gives an opportunity
to study its work principles deeper. Let's get acquainted with them.
1. Skip of development environment keywords not influencing the
program processing While developing a code analyzer for a specific development environment, you are likely to come across
with its specific language constructions. These constructions are often guidance for a concrete compiler
and may not be of interest for you. But such constructions cannot be processed by OpenC++ library as
they are not a part of C++ language. In this case one of the simplest ways to ignore them is to add them
into rw_table table with ignore key. For example:
static rw_table table[] = {
...
{ "__ptr32", Ignore},
{ "__ptr64", Ignore},
{ "__unaligned", Ignore},
...
};
While adding you should keep in mind that words in rw_table table should be arranged in alphabetic
order. Be careful.
2. Addition of a new lexeme If you want to add a keyword which should be processed, you need to create a new lexeme ("token").
Let's look at the example of adding a new keyword "__w64". At first create an identifier of the new
lexeme (see token-name.h file), for example in this way:
enum {
Identifier = 258,
Constant = 262,
...
W64 = 346, // New token name
...
};
Modernize the table "table" in lex.cc file:
static rw_table table[] = {
...
{ "__w64", W64 },
...
};
The next step is to create a class for the new lexeme, which we'll call LeafW64.
namespace Opencxx
{
class LeafW64 : public LeafReserved {
public:
LeafW64(Token& t) : LeafReserved(t) {}
LeafW64(char* str, ptrdiff_t len) :
LeafReserved(str, len) {}
ptrdiff_t What() { return W64; }
};
}
To create an object we'll need to modify optIntegralTypeOrClassSpec() function:
...
case UNSIGNED :
flag = 'U';
kw = new (GC) LeafUNSIGNED(tk);
break;
case W64 : // NEW!
flag = 'W';
kw = new (GC) LeafW64(tk);
break;
...
Pay attention that as far as we've decided to refer "__w64" to data types, we'll need the 'W' symbol for
coding this type. You may learn more about type coding mechanism in Encoding.cc file.
Introducing a new type we must remember that we need to modernize such functions as
Parser::isTypeSpecifier() for example.
And the last important point is modification of Encoding::MakePtree function:
6. Correction of string literal processing function We offer you to modify Lex::ReadStrConst() function as it is shown further. This will allow to correct two
errors related to processing of separated string literals. The first error occurs while processing strings of
9. Update of rTemplateDecl2 function Without going into details we offer you to replace rTemplateDecl2() function with the given variant. This
will exclude some errors while working with template classes.
bool Parser::rTemplateDecl2(Ptree*& decl,
TemplateDeclKind &kind)
{
Token tk;
Ptree *args = 0;
if(lex->GetToken(tk) != TEMPLATE)
return false;
if(lex->LookAhead(0) != '<') {
if (lex->LookAhead(0) == CLASS) {
// template instantiation
decl = 0;
kind = tdk_instantiation;
return true; // ignore TEMPLATE
}
decl = new (GC)
PtreeTemplateDecl(new (GC) LeafReserved(tk) );
} else {
decl = new (GC)
PtreeTemplateDecl(new (GC) LeafReserved(tk) );
if(lex->GetToken(tk) != '<')
return false;
decl = PtreeUtil::Snoc(decl, new (GC) Leaf(tk ));
if(!rTempArgList(args))
return false;
if(lex->GetToken(tk) != '>')
return false;
}
decl =
PtreeUtil::Nconc(decl,
PtreeUtil::List(args, new (GC) Leaf(tk)));
// ignore nested TEMPLATE
while (lex->LookAhead(0) == TEMPLATE) {
lex->GetToken(tk);
if(lex->LookAhead(0) != '<')
break;
lex->GetToken(tk);
if(!rTempArgList(args))
return false;
if(lex->GetToken(tk) != '>')
return false;
}
if (args == 0)
// template < > declaration
kind = tdk_specialization;
else
// template < ... > declaration
kind = tdk_decl;
return true;
}
10. Detection of Ptree position in the program text In some cases it is necessary to know in what places of the program text there is the code from which a
particular Ptree object was built.
The function given below returns the address of the beginning and the end of memory space with the
text of the program from which the mentioned Ptree object was created.
11. Support of const A (a) type definitions OpenC++ library doesn't support definition of variables of "const A (a)" type. To correct this defect a part
of the code should be changed inside Parser::rOtherDeclaration function:
if(!rDeclarators(decl, type_encode, false))
return false;
Instead of it the following code should be used:
if(!rDeclarators(decl, type_encode, false)) {
// Support: const A (a);
Lex::TokenIndex after_rDeclarators = lex->Save();
lex->Restore(before_rDeclarators);
if (lex->CanLookAhead(3) && lex->CanLookAhead(-2) ) {
In this code some auxiliary functions are used which are not discussed in this article. But you can find
them in VivaCore library.
12. Support of definitions in classes of T (min)() { } type functions Sometimes while programming one has to use workarounds to reach the desirable result. For example,
a widely known macro "max" often causes troubles while defining in a class a method of "T max()
{return m;}" type. In this case one resorts to some tricks and define the method as "T (max)() {return
m;}". Unfortunately, OpenC++ doesn't understand such definitions inside classes. To correct this defect
Parser::isConstructorDecl() function should be changed in the following way:
bool Parser::isConstructorDecl()
{
if(lex->LookAhead(0) != '(')
return false;
else{
// Support: T (min)() { }
if (lex->LookAhead(1) == Identifier &&
lex->LookAhead(2) == ')' &&
lex->LookAhead(3) == '(')
return false;
ptrdiff_t t = lex->LookAhead(1);
if(t == '*' || t == '&' || t == '(')
return false; // declarator
else if(t == CONST || t == VOLATILE)
return true; // constructor or d eclarator
else if(isPtrToMember(1))
return false; // declarator (::*)
else
return true; // maybe constructo r
}
}
13. Processing of constructions "using" and "namespace" inside
functions OpenC++ library doesn't know that inside functions "using" and "namespace" constructions may be
used. But one can easily correct it by modifying Parser::rStatement() function:
bool Parser::rStatement(Ptree*& st)
{
...
case USING :
return rUsing(st);
case NAMESPACE :
if (lex->LookAhead(2) == '=')
return rNamespaceAlias(st);
return rExprStatement(st);
...
}
14. Making "this" a pointer As it is known "this" is a pointer. But it's not so in OpenC++. That's why we should correct
Walker::TypeofThis() function to correct the error of type identification.
Replace the code
void Walker::TypeofThis(Ptree*, TypeInfo& t)
{
t.Set(env->LookupThis());
}
with
void Walker::TypeofThis(Ptree*, TypeInfo& t)
{
t.Set(env->LookupThis());
t.Reference();
}
15. Optimization of LineNumber() function We have already mentioned Program::LineNumber() function when saying that it returns file names in
different formats. Then we offered FixFileName() function to correct this situation. But LineNumber()
function has one more disadvantage related to its slow working speed. That's why we offer you an
optimized variant of LineNumber() function.
/*
LineNumber() returns the line number of the line
pointed to by PTR.
*/
size_t Program::LineNumber(const char* ptr,
const char*& filename,
ptrdiff_t& filename_leng th,
const char *&beginLinePt r) const
{
beginLinePtr = NULL;
ptrdiff_t n;
size_t len;
size_t name;
ptrdiff_t nline = 0;
size_t pos = ptr - buf;
size_t startPos = pos;
if(pos > size){
// error?
assert(false);
filename = defaultname.c_str();
filename_length = defaultname.length();
beginLinePtr = buf;
return 0;
}
ptrdiff_t line_number = -1;
filename_length = 0;
while(pos > 0){
if (pos == oldLineNumberPos) {
line_number = oldLineNumber + nline;
assert(!oldFileName.empty());
filename = oldFileName.c_str();
filename_length = oldFileName.length();
assert(oldBeginLinePtr != NULL);
if (beginLinePtr == NULL)
beginLinePtr = oldBeginLinePtr;
oldBeginLinePtr = beginLinePtr;
oldLineNumber = line_number;
oldLineNumberPos = startPos;
return line_number;
}
switch(buf[--pos]) {
case '\n' :
if (beginLinePtr == NULL)
beginLinePtr = &(buf[pos]) + 1;
++nline;
break;
case '#' :
len = 0;
n = ReadLineDirective(pos, -1, name, len) ;
if(n >= 0){ // unless #pr agma
if(line_number < 0) {
line_number = n + nline;
}
if(len > 0 && filename_length == 0){
filename = (char*)Read(name);
filename_length = len;
}
}
if(line_number >= 0 && filename_length > 0) {
oldLineNumberPos = pos;
oldBeginLinePtr = beginLinePtr;
oldLineNumber = line_number;
oldFileName = std::string(filename,
filename_leng th);
return line_number;
}
break;
}
}
if(filename_length == 0){
filename = defaultname.c_str();
filename_length = defaultname.length();
oldFileName = std::string(filename,
filename_length);
}
if (line_number < 0) {
line_number = nline + 1;
if (beginLinePtr == NULL)
beginLinePtr = buf;
oldBeginLinePtr = beginLinePtr;
oldLineNumber = line_number;
oldLineNumberPos = startPos;
}
return line_number;
}
16. Correction of the error occurring while analyzing "#line" directive In some cases Program::ReadLineDirective() function glitches taking irrelevant text for "#line" directive.
The corrected variant of the function looks as follows:
ptrdiff_t Program::ReadLineDirective(size_t i,
ptrdiff_t line_number,
size_t& filename, size_t& filename_length) const
{
char c;
do{
c = Ref(++i);
} while(is_blank(c));
#if defined(_MSC_VER) || defined(IRIX_CC)
if(i + 5 <= GetSize() &&
strncmp(Read(i), "line ", 5) == 0) {
i += 4;
do{
c = Ref(++i);
}while(is_blank(c));
} else {
return -1;
}
#endif
if(is_digit(c)){ /* # <line> <file> */
unsigned num = c - '0';
for(;;){
c = Ref(++i);
if(is_digit(c))
num = num * 10 + c - '0';
else
break;
}
/* line_number'll be incremented soon */
line_number = num - 1;
if(is_blank(c)){
do{
c = Ref(++i);
}while(is_blank(c));
if(c == '"'){
size_t fname_start = i;
do{
c = Ref(++i);
} while(c != '"');
if(i > fname_start + 2){
filename = fname_start;
filename_length = i - fname_sta rt + 1;
}
}
}
}
return line_number;
}
Conclusion Of course, this article covers only a small part of possible improvements. But we hope that they will be
useful for developers while using OpenC++ library and will become examples of how one can specialize
the library for one's own tasks.
We'd like to remind you once more that the improvements shown in this article and many other
corrections can be found in VivaCore library's code. VivaCore library may be more convenient for many
tasks than OpenC++.
If you have questions or would like to add or comment on something, our Viva64.com [10] team is
always glad to communicate. We are ready to discuss appearing questions, give recommendations and
help you to use OpenC++ library or VivaCore library. Write us!
References 1. Zuev E.A. The rare occupation. PC Magazine/Russian Edition. N 5(75), 1997.
http://www.viva64.com/go.php?url=43.
2. Margaret A. Ellis, Bjarne Stroustrup. The Annotated C++ Reference Manual. Addison Wesley,