BTYACC is a backtracking, predicated parser designed to facilitate
the parsing of ambiguous languages such as C++. It is available
from Siber Systems
and works on most *nix platforms (including CYGWIN) and WIN32.
keystone currently uses BTYACC version 3.0 which can
be built using the following steps.
make
Treecc is a syntax tree generator designed to allow safe
and easy manipulation of syntax trees. It is available
from
Souther Storm Software, Pty Ltd and works on most any platform.
keystone currently uses Treecc version 0.1.4 which
can be built using the following steps.
./configure make all make check make install
The final product of the keystone build should be a shared library which contains the necessary data and functions to create an ASG and symbol table for a program written in the ISO C++ standard. Additionally, a test program is also created which uses the routines to generate output about an input file. Both the test program and the library will build during the make.
make home
The first step in making keystone is to set up
keystone's makeinclude file. This is accomplished
by typing make home
from within the build directory.
make
keystone can then be made by typing make
at the command line. Optionally, an argument can be
given to make in the form of CONFIG=arg. This argument
will be passed to all subsequent g++ calls. Three common
arguments to config are -DYYDEBUG, which will turn
on BTYACC's internal debug statemtns; -D_PARSE_DEBUG,
which will print upon entry into each of the functions
corresponding to a grammar rule; and -D_THG_DEBUG,
which will print upon entry to a majority of the functions
that make up the keystone system.
make test
This command runs the perlscripts/gather_stats.pl perl
script on the testsuite directory. A report file
is generated with the results. A list of currently
failing tests will appear later in this section.
make check
This command will check each file in the directory against
those listed in the MANIFEST file. If the file is not
listed in the MANIFEST file, it will print it out.
These files are extraneous and can be deleted. It will
also issue an error if a file is listed in the MANIFEST
but does not appear in the distribution. This is a problem
in the distribution and should be reported as a bug (see
section on reporting bugs below).
make distclean
This command will eliminate all extraneous files as
reported by make check.
By default keystone reads from STDIN. Simply type
run
on the command line for a prompt. After typing
in the code snippet you wish to run, type CTRL-D to have
keystone process it. If the parse was successful,
you should get normal output; otherwise, you will get
an error message.
If run
is passed an argument, that argument is assumed
to be the name of a file to run. The file is NOT preprocessed,
nor changed in any way before being analyzed by the keystone
system. The output will be the same as if read from STDIN.
If the parse was successful, then keystone will print three output sections. The first section will consist of all the tokens that underwent name lookup and the results of that lookup. The second section is a listing of all of the scopes in the system and their contents. The third section contains statistical information and the last section is a text representation of the ASG.
3.1. Name Lookup Output
A typical statement that occurs from Name Lookup appears below.
UNL: did not find decl: use of T (2,6) ElabClassThe first three letters detail what kind of lookup was performed. The options are
X::Y
. It forces
Y
to be looked up in X
's scope.y
to be looked
up in the scope of x
. Currently not used.3.2. Symbol Table Output
The symbol table output describes each one of the symbol tables
that were created during the program parse. Each symbol table
describes its contained in scope as well as the members that
are declared inside it. An example symbol table output is
shown below.
NAMESPACE: _GlobalNamespace (0,0): namespace _GlobalNamespace (no refs) i (1,2) instance of: int (no refs) C (2,2): class C (no refs) END OF NAMESPACE SCOPE: _GlobalNamespace CLASS: C (2,2): class C (no refs) in _GlobalNamespace j (2,7) instance of: int Public (no refs) END OF CLASS SCOPE: C_GlobalNamespace is the namespace that represents file scope. The (0,0) represents the starting line and token number, respectively. After the second :, the NameDeclaration object is printed followed by any obvious references to that declaration. On the next line, and indented is each NameDeclaration object that was declared inside the scope. In this case, i was an integer variable(instance) that was never used (no refs) and C was a class type. A similar scope printout for C's class scope appears next.
3.3 Statistics
The statistics printed consists of the following information.
3.4. The ASG output
The ASG is printed using the ASGPlainTextVisitor class
found in ASG/visitors/ASGPlainTextVisitor.{h,cpp}. It
is described in detail in the section on the ASG implementation.
An error can take one of three separate forms. It can be an internal assertion error; an invariant validation error, or a parser error.
4.1. Internal assertion errors
Internal assertion errors are caused by invalid preconditions
or postconditions from various functions throughout the system.
For a complete list, please see the section on Assertions.
The error message will look like:
Failed assertion: cannot add NULL reference. in file: NameDeclaration.cpp on line: 140If you receive this error message, please submit a bug report.
4.2. Invariant validation errors
Invariant validation errors occur at the end of the parse, but
before the ASG is printed. These errors indicate a breakage
of a class invariant. These errors are implemented as C assertions
and as such have system dependent output. Normally, they will
cause a core dump. If one of these errors occurs, please submit
a bug report.
4.3. Parser errors
Parser errors are the least informational and simply say Error parsing!
.
You should check your code against a separate compiler( such as GCC) to
ensure that it is standard conformant. If it appears to be so, submit
a bug report.
4.4. Submitting a bug report
To submit a bug report, please send an email message to
Tanton Gibbs or
Brian Malloy consisting
of your environment and the smallest possible program which can
reproduce the undesired behavior.
There are a number of outstanding issues.
There are four auxillary perl scripts which come bundled with keystone. While they do not directly affect keystone they simplify some of the common tasks done when testing keystone. Three of the scripts will be discussed here, the fourth will be discussed in conjunction with the ASG.
This script runs keystone on a directory of files and ensures
that the output of each file is error free. It requires GCC and bash or tcsh.
It summarizes the results of its run in a file called report.out.
Its usage is: perl gather_stats.pl directory
The report.out file looks like
CORED: FAILED: ../testsuite/regression/ddjstuff/clause14/14.5.2-2.cpp ../testsuite/regression/ddjstuff/clause14/14.8.2.4-14b.cpp ../testsuite/regression/ddjstuff/clause14/14.8.2.4-18.cpp Total Succeeded: 428 MIN Decorated: 0 MAX Decorated: 25 TOT Decorated: 1626 AVG Decorated: 3.79906542056075 MIN NonDecorated: 1 MAX NonDecorated: 35 TOT NonDecorated: 3683 AVG NonDecorated: 8.60514018691589 MIN Non Ident: 4 MAX Non Ident: 131 TOT Non Ident: 16451 AVG Non Ident: 38.4369158878505 MIN Specifiers set: 0 MAX Specifiers set: 25 TOT Specifiers set: 1765 AVG Specifiers set: 4.12383177570093 MIN Qualifiers set: 0 MAX Qualifiers set: 9 TOT Qualifiers set: 235 AVG Qualifiers set: 0.549065420560748The CORED section contains all programs that caused a core dump.
This script will analyze a C++ file and give a report on the number
of non-empty lines, POD classes, and classes with functions that
exist in the file. It prints the report to STDOUT.
Usage: gather_info.pl filename
This file is used by the gather_stats.pl file to eliminate any system headers as well as any lines that contain cout or <<. The later is actually not necessary, but is in for historical reasons. It prints the preprocessed file to STDOUT. Usage: preproc.pl filename
The keystone system is comprised of two major subsystems:
the Program Processor and the Symbol Table.
The Program Processor subsystem includes scanning and parsing and is responsible for initiating and directing symbol table construction and name lookup. This responsibility includes two phases:
enum
class
x.a
namespace
x.~X()
)A::b
)A::b
will produce two name occurence objects,
one for A
and one for b
. The specifier for A
will be Qualifier
. Furthermore,
b
's NameOccurrence object will have a pointer to A
's NameOccurrence object as its qualifier.this->x
).
The Symbol Table subsystem is designed to allow name lookup in keystone as described in Clause 3 of the ISO C++ standard. It consists of the NameDeclaration class and the Scope heirarchy, which consists of the scopes listed in the previous section. Each Scope contains pointers to those NameDeclarations which were declared inside it as well as a pointer to the containing scope. Furhtermore, both ClassScope and PrototypeScope contain a pointer to the TemplateParameterScope (if any). NameDeclaration objects represent the declaration of an identifier or type. They contain a pointer to thier enclosing scope as well as their corresponding scope. For example, in the following code snippet:
int f() { class C{}; }The NameDeclaration object for
C
would contain a pointer to the enclosing local scope as well
as the corresponding class scope.C
has class type and is not an instance. Had there been
a variable of type C
declared, then the variable would have had class type and would have been an instance.
The Type heirarchy represents the various types that a NameDeclaration can be declared as. It consists of the
following classes:
C
above, and its Type (ClassType in this case). However,
Typedefs are represented in a slightly different way. The NameDeclaration representing a typedef has
a pointer to a Type object, but the Type object it points to contains a different NameDeclaration object.
As an example, consider the following:
class C {}; typedef C MyC;In this example,
C
will have a ClassType data member. That member will also have a NameDeclaration that
points back to C
. MyC
will also have a ClassType data member. However, that ClassType
will point to the NameDeclaration for C
, not MyC
.
class T
or typename T
. Template parameters with specified type are
represented by one of the previously mentioned types.In this section, we will examine the classes responsible for the bulk of work in keystone. It will be a low level overview with numerous code samples with explanations.
The NameOccurrence class represents an IDENTIFIER seen in the input stream. It can be adorned with a
specifier (covered previously) and it can be qualified (also covered previously). Furthermore, it
can be marked as a template occurrence. A NameOccurrence is marked as being a template if it
is preceeded by the template
keyword. This allows the NameOccurrence to be correctly
identified as an ID_template_name by the TokenDecorator. For example, in the following code snippet:
class X { template<int i> void f(); }; X::template f<100>();f would be marked as a template occurrence.
The TokenBuffer is responsible for gathering the information from the scanner and passing it on to the parser. It also saves various information about the surrounding context for later use by the TokenDecorator and the parser. In the following section we describe some of the important member functions and how they are used.
Data Structures
The Token Buffer contains an array of MAX_BUF entries (currently 5). These entries contain
the token, a std::string with the lexeme, a YYSTYPE structure which will contain a TokenInfo
pointer, and a NameDeclaration pointer which will point to the last Qualifier in a nested name
specifier with a template name (e.g. X<7,13>::Y< Z<int>::Q >::p [The first > contains a pointer
to X's
NameDeclaration object, the second > points to Z
's, the third points to Y's
] ).
2.1 init This function is called by the constructor. It reads MAX_BUF tokens into the buffer array.
2.2 getNextToken This function effectively looks at the next token. To accomplish this, it will do one of two things.
class
or struct
keyword and the next two tokens are IDENTIFIER and COLON,
then it notes that it is inBaseSpecifierList.X<(5>7)>
. In the previous example, only
the last > sign represents the end of the template parameter list.2.3. getMultiWordType This function looks at the current token and the next one or two tokens to determine if the type is a multi word type (e.g. long int, unsigned short int, unsigned char). The algorithm is fairly straight forward and can be seen in the code.
2.4. setOccurSpec
This function is responsible for setting any occur specifiers on the NameOccurence object.
It begins by checking the left context to see if the previous token was the enum
keyword. If so, it sets the ElabEnum
specifier.
Otherwise it looks for the class
, union
, or struct
keywords
and, if one is found, it sets the ElabClass
specifier.
If not found, it looks for the namespace
keyword and sets the Namespace
specifier if found.
Otherwise, it looks for a preceeding ~ and sets the Destructor qualifier if found.
After checking the left context, the TokenBuffer checks the right context for a COLCOL (::).
If found, the Qualfier
specifier is set.
Finally, it checks to see if it is inMemberAccess and does not have a specifier. If so,
it is specified as a Member
. If that is not true, then it checks to see whether
or not it is inBaseSpecifierList and it is not a qualifier. If so, then it sets the
ElabClass
specifier.
2.5. getNameOccurrence
This function will turn the TokenInfo pointer contained in the buffer array into a NameOccurrence object.
It creates a NameOccurrence object and attempts to set the occur qualifier through a number of steps.
First, if the previous two tokens were ::~, then it uses the last identifier seen as the qualifier.
Otherwise, if the previous two tokens were IDENTIFIER ::, then it uses the NameDeclaration object
for the IDENTIFIER token as the qualifier.
If that is not the case, then it checks to see if the previous two tokens were >::. If so, it
looks at the > token's lastDecl member, and if non-NULL, uses it for a qualifier.
Otherwise, it will check to see if the previous two tokens were template
::. If so, it
will get the ::'s lastDecl member (unless, NULL then it uses the last identifier).
If all of the above fail, and the previous token is ::, then it is a global qualifier and the
GlobalNamespace is retrieved from the ActionFacade and used as a qualifier.
If the previous token is template
and the token before is a ::, ->, or ., then
the NameOccurence is marked as being a template.
After performing the qualifications, this function calls setOccurSpec and returns the NameOccurrence object.
2.6. saveId This function is called by the TokenDecorator whenever a template argument list is entered. It sets the inTemplateArgList value to true and pushes the previous identifier onto a stack.
2.7. restoreId This function is called at the end of a template argument list. It checks to see if the current token is a GREATER (>) or a COLCOL (::) and if so, sets their lastDecl pointer to the NameDeclaration at the top of the stack, pops the stack, and sets the inTemplateArgList member to false. ** This is a bug.
2.8. setNameDeclaration
This function is called once a NameDeclaration is found by the TokenDecorator. It sets
the current Token's name declaration to be the one found by the TokenDecorator. It
also sets the last_identifier member to the NameDeclaration pointer passed in.
Furthermore, if the next token is a LEFTPAREN (() and the previous token was
not template
and the type of the declaration is FunctionType, then
the lastWasF member variable is set to true. The lastWasF member variable
is used to ensure that PrototypeScope is entered on time and correctly through
BTYACC predicates (described later).
The TokenDecorator is responsible for decorating the token so that the parser can embed semantic information into its grammar. Furthermore, it provides an interface to many of the TokenBuffer's operations such as getLastIdentifier, lastWasFun, and getTokenCount. It is also responsible for saving and printing the statistics that appear in the output (described previously). Also, it uses the InvariantFaciliatators to check the classes invariants after printing the statistics.
3.1. getNextToken
getNextToken is the function that directs the token decoration
process. First, a token is retrieved from the TokenBuffer through
use of the TokenBuffer::getNextToken function. Next, if the
current token is an identifier, it undergoes token decoration.
If it is not, it is passed on unchanged.
The token decoration process begins by the retrival of the
NameOccurrence object from the TokenBuffer through use of the
TokenBuffer::getNameOcurrence function. Next, the occurrence
is translated into a NameDeclaration though the doLookup
function. If the doLookup function returns NULL, then a
new NameDeclaration object is created in the current scope.
Furthermore, if the name occurrence was a template, then it
is marked as a template. If the doLookup function does not
return NULL, then the NameDeclaration returned has a reference
added to it. However, in this case, it is not
marked as a template. Instead, it is determined to be a template
if it has not be determined already. Basically, there are two
different ways to be a template, markTemplate says that you
have not been declared, but are a template, setIsTemplate says
you have been declared a template. Next, if the NameOccurrence
object received from the TokenBuffer has a qualifier, then that
qualifier is given to the NameDeclaration object to be its
qualifier. Then, the NameDeclaration object that was
created (or found) is handed back to the TokenBuffer. Finally,
getCSIdent is called on the NameDeclaration object and the
result of that is used as the current token and passed to the
parser.
3.2. doLookup doLookup uses the action facade to make its initial lookup. If the NameOccurrence object to be looked up has a qualifier, then this function calls the ActionFacade's lookupQualifiedName. Otherwise, this function calls the ActionFacade's lookupUnqualifiedName. If a NameDeclaration was not found, then it could be because we are the first argument in a class method declared out of line. For instance:
class X { public: typedef int T; void f( T ); }; void X::f( T ) {}In this case, the parser will not have executed the rule to enter prototype scope before T is encountered in the lookahead (see section on problems with Token Decorated Parsing). However, T must be looked up in f's corresponding scope to be found. Therefore, the following code is used:
if( decl == NULL && buf.lastWasFun() && buf.getLastIdentifier()->getCorrespondingScope() ) { decl = buf.getLastIdentifier()->getCorrespondingScope()->lookup( id ); }** This is a bug and should go before the other lookups. Finally, whether found or not, the NameDeclaration object is returned.
3.3 getCSIdent
This function takes a NameDeclaration object (and whether it
was preceeded by template
), and returns back a
potentially decorated token.
If the NameDeclaration object does not have a type object, and
it was not preceeded by template
, then it returns
the undecorated IDENTIFIER
token.
If the NameDeclaration object was preceeded by template
,
or the next token in the buffer is a < character and isTemplate
returns true, then the saveId is called for the TokenBuffer
and an ID_template_name
is returned. The grammar
only wants an ID_template_name when it is about to start a template
argument list. Therefore, it must either be preceeded by template
or proceeded by a < sign.
If the NameDeclaration object has class type and it is not
an instance and it is not a typedef, then ID_class_name
is
returned.
If the NameDeclaration object has enum type and it is not
an instance, then ID_enum_name
is returned.
If the NameDeclaration object has namespace type and it
is not an instance, then ID_original_namespace_name
is
returned.
If the NameDeclaration object has namespace type and it
is an instance, then ID_namespace_alias
is returned.
If the name declaration is a typedef:
else if( !decl->isInstance() && (decl->getType()->getDecl() != decl || decl->getType()->checkType(Type::DT_INDIRECT) || decl->getType()->checkType(Type::DT_FUNCTION) ) ) {then
ID_typedef_name
is returned.ID_class_name
is returned. It used to be that
ID_typedef_name
was returned for this, as template
parameters do not have to be classes. However, the grammar
is not set up to support that, and favors ID_class_name
over
ID_typedef_name
; therefore, it was changed.IDENTIFIER
token is returned.
3.4. isTemplate
This function determines if a NameDeclaration object is a template.
If the object has a corresponding class scope, and the corresponding
scope has template parameters, then it is a template.
If the object has a corresponding function scope, and the corresponding
function scope's prototype scope has template parameters, then
it is a template.
If the object has a corresponding prototype scope, and the corresponding
prototype scope has template parameters, then it is a template.
If the type of the object is template template parameter, then it
is a template.
If the object has been previous set as a template (through setIsTemplate),
then it is a template.
The NameDeclaration class tries to contain as much information about the declaration of an object or type as possible. It contains a pointer to the corresponding and containing scope and to the type. It also contains whether it is an instance of the type or the type declaration. It records whether or not it is in the symbol table, is a template, or has been marked as a template. It contains a special entry to its destructor due to the fact that it should be autogenerated and can be explicitly called. It also contains a list of references (when it has been seen) that is mostly incomplete, but can be useful at times. Furthermore it contains its access (Public, Private, Protected, or Unknown), as well as, any attributes such as friend, pure, virtual, const, static, or extern (extern is not currently supported).
4.1. Constructor The constructor's main job is to change the name if it is a constructor or destructor being declared. A flag is passed in to tell if it is a destructor. The actual code is:
if( definedIn && dynamic_castThe name must be changed from X to X::X or X::~X to prevent name lookup from finding the constructor call when it should be finding the class type.( definedIn ) && getName() == definedIn->getName() ) { if( isDestructor ) setName( definedIn->getName() + "::~" + definedIn->getName() ); else setName( definedIn->getName() + "::" + definedIn->getName() );
4.2. isaMatch
This function is responsible for determining if a NameOccurrence
matches a NameDeclaration in lookup. First, it checks to see
if thier names match. If so, then it analyzes the OccurSpecifier
of the NameOccurrence.
If the specifier is ElabEnum
, then the type of the
NameDeclaration must be enum, or template parameter.
If the specifier is ElabClass
, then the type of the
NameDeclaration must be class, template parameter, or
template template parameter. If the specifier is
Member
, then the type can be NULL, or the type
can be enum or class if it is not an instance and the
type is not a namespace.
If the specifier is a namespace, then the type must be a namespace.
If the specifier is a qualifier, then the type can be a class
that is not an instance, a namespace, a basic type that is not an
instance (for typedef int T; T x; x::~T()), a template parameter
that is not an instance, or a template template parameter that is not
an instance.
If the specifier is not listed above, then a match is assumed.
4.3. getDestructor
This function is responsible for finding or creating a destructor
for this NameDeclaration. First, it checks to see if the
type is a class or namespace type. If it is not, then
a destructor called __basic_destructor
is created,
entered into the global scope, and returned. Otherwise,
if the type is a namespace, NULL is returned. Now, it
must be a class. Therefore, the corresponding class scope
is looked up, the name is changed into X::~X
form,
and then looked up. If one is not found, a destructor is
created and inserted into the class scope.
The Scope Heirarchy consists of 7 scopes: Scope, LocalScope, NamespaceScope, ClassScope, PrototypeScope, FunctionScope, and TemplateParameterScope.
5.1. Scope Scope is an abstract base class that provides default implementations for many virtual functions.
5.1.1. insertLocal This function pushes a NameDeclaration object onto the back of a list. It also sets the NameDeclaration objects containing scope and sets the NameDeclaration inSymbolTable flag.
5.1.2. undoRecentInsertLocal This function takes a NameDeclaration and compares it to the last NameDeclaration inserted. They must be the same or an error is issued. If they are the same, the NameDeclaration last inserted is poped from the back of the list.
5.1.3. removeDeclaration This function accepts a NameDeclaration and finds it in its list of local objects. If found, it removes it.
5.1.4. findHere Given a NameOccurrence, this function looks through its list of NameDeclarations to try and find a match. It uses the isaMatch function of the NameDeclaration class to assure a match. Furthermore, if the occurrence has not been specified to be a class or an enum, it searches for non types first, then types. This allows:
class X {}; X X; X X2; // this should fail...X refers to the variable class X X3; // this should pass...X has been specified as a class.
5.2. LocalScope A LocalScope represents a basic block. In addition to the NameDeclaration list inherited from Scope, it also contains a list of used namespaces.
5.2.1. canContain This returns true if the specifier is nothing, ElabClass, or ElabEnum, since their declarations can occur within a LocalScope.
5.2.2. lookup First the name occurence is passed to canContain. If it can be contained within a local scope, the findHere method of Scope is called. If the declaration has not been found, then the used namespaces are searched. If it still has not been found, then the containing scope is looked in as well as any anonymous scopes of the containing scope.
5.2.3. searchUsedNamespaces This function merely loops through the list of namespace scopes and does a lookup in each one.
5.3. NamespaceScope The NamespaceScope class represents a named or anonymous namespace scope. It also contains a list of used namespaces.
5.3.1. canContain A NamespaceScope can contain an enum, class, member, namespace, pseudoDestructor, Destructor, Qualilfier, or nothing.
5.3.2. lookup Since namespaces may recursively include one another, they must be stored in a set as they are being searched. If the namespace is ever found in the set, then it is not searched as it has already been searched through a recursive using clause. If it is not found in the set, then it puts a pointer to itself in the set saying that it has been searched. If the canContain member function returns true, then the findHere function is called on the NameOccurrence being looked up. If a NameDeclaration is not found, then used namespaces are searched. If it has still not been found, then the containing scope is searched along with any anonymous scopes in the containing scope.
5.3.3. searchUsedNamespaces This function merely loops through the list of namespace scopes and does a lookup in each one.
5.4. ClassScope A ClassScope represents the scope of a class declaration. Along with the NameDeclaration list, a list of base classes and thier access restrictions, a list of friends, and a TemplateParameterScope, is stored.
5.4.1. canContain A class scope can contain a conversion function, a destructor, a destructor qualifier, an enum, a class, a member, a pseudo-destructor, a qualifier, or nothing.
5.4.2. lookup This function first checks the canContain function to see if it should be looked up in this scope. If so, the findHere function is called. If it is not found, but the name is the same as the class name, then the current declaration is returned (for destructors). If the declaration has still not been found, and a template parameter scope exists, then the template parameter scope is searched. Next, the base classes are searched followed by any containing scopes.
5.5. Prototype Scope A PrototypeScope represents a function's prototype. It also contains the template parameter scope, if any, for the function.
5.5.1 canContain A PrototypeScope can only contain those declarations whose specifier is nothing.
5.5.2 lookup If a template parameter scope exists, then it is looked up in first. Otherwise, the current scope is looked in. If still not found, the containing scope is examined. Finally, if the declaration is qualified, and the qualifier is of class scope, that scope is examined.
5.6. FunctionScope
A function scope represents the scope of the entire function,
not just a basic block. Therefore, the ISO C++ standard
states that the only thing that can be in a FunctionScope is
a label. It is the LocalScope inside the FunctionScope that
actually contains most declarations.
A function scope contains a pointer to its prototype scope and
a list of pointers to local scopes declared inside it.
5.6.1. canContain Since the FunctionScope can only contain labels, the only specifier that can occur is nothing.
5.6.2. lookup Lookup examines the current scope to try and find the declaration. If it is not found, and the Prototype scope exists, then the Prototype scope is examined. If still not found, then containing scopes and thier anonymous scopes are examined.
5.7. TemplateParameterScope A TemplateParameterScope represents any template argument which exists in a template parameter list.
5.7.1. canContain A template parameter scope can contain an occurence with ElabEnum, ElabClass, a Qualifier, or nothing.
5.7.2. lookup A template parameter scope, if canContain returns true, calls the Scope::findHere function and then checks the containing scopes and thier anonymous scopes.
The Type Heirarchy consists of 9 classes: Type, BasicType,
FunctionType, IndirectType, ClassType, EnumType, NamespaceType,
TemplateParameterType, and TemplateTemplateParameterType.
Each class must provide an override to the getDecl function,
which returns the name declaration for the type, the checkType
function, which checks the dynamic type of the object,
and the clone function, which returns a copy of the object.
The checkType function uses an enum DynType which contains the
following members
6.1. Type This is the base class for the hierarchy and provides facilities to make the type const or volitile as well as to check to see if the type is const or volatile.
6.2. BasicType This class represents a BasicType (int, float, char, etc...).
6.2.1 checkBasicType BasicType provides an enum, BType, to represent which Basic type it is. The enum provides the following enumerators:
6.2.2. getDecl The getDecl function returns NULL since a basic type does not have a declaration.
6.3. FunctionType This type represents the type of a function. It contains an enum member which determines whether it is a prototype or a definition. It also contains the return type of the function.
6.3.1. getReturnType This returns the return type of the function.
6.3.2. getDecl This returns the name declaration object for the function declaration.
6.3.3. checkType This always returns true for DT_FUNCTION. For DT_FUNCTION_PROTOTYPE it returns true if the enum member is Q_PROTOTYPE. For DT_FUNCTION_DEFINITION, it returns true if the enum member is Q_FUNCTION.
6.4. IndirectType This class represents the type of a pointer, rererence, or array. It contains an enum member that determines which one it is. It also contains a pointer to the pointed-to type.
6.5. ClassType This represents the type of a class definition. It contains the NameDeclaration object for the original definition.
6.6. EnumType This represents the type of an enum definition.
6.7. NamespaceType This represents the type of a namespace definition.
6.8. TemplateParameterType This represents the type of a template parameter.
6.8. TemplateTemplateParameterType This represents the type of a template template parameter.
While the previous section focused on classes that acted as data structures, this section will focus on the classes that direct those data structes' actions.
The LookupController is responsible for determining how a Name Occurrence or NameDeclaration should be looked up. It contains a pointer to a ContextManager object.
1.1. lookupUnqualifiedName This function first clears the set of searched namespaces, then, it gets the current scope from the ContextManager and calls the lookup method for that scope. If the NameDeclaration was not found, it asks the ContextManager for the first template scope and searches it. It has to check for the template scope seperately because the template scope may not have been attached to a class or prototype scope yet due to lookahead.
1.2. lookupQualifiedName This function first clears the set of searched namespaces. Next, it gets the qualifier from the NameOccurrence. If the qualifier has TemplateParameterType or TemplateTemplateParameterType, the function returns NULL as no lookup is available. Otherwise, the corresponding scope is retrieved and lookup is performed in that scope.
1.3. lookupPseudoDestructor This function calls the getDestructor function of the NameDeclaration passed in.
1.4. Other functions The LookupController contains other functions to lookup class scopes, function calls, binary operators, etc... However, they are not currently used and may be overhauled or removed in a future release.
The ContextManager is responsible for managing the entrance and exit of scopes. It is designed to be a singleton. It contains two data structures, a stack of scopes which represent the nested scope heirarchy, and a list of scopes representing all of the scopes seen.
2.1. getCurrentScope Returns the top of the stack of scopes.
2.2. getPreviousScope Returns the second element in the stack of scopes.
2.3. enterScope Pushes a scope on the stack
2.4. leaveCurrentScope Pops a scope off the stack
2.5. moveIntoPrototypeScope This function accepts a name declaration and attempts to move into its prototype scope. It could be that this declaration has been predeclared (e.g., int x();), if so, then it will already have a corresponding scope and that scope should be entered. This is represented by the code:
if( decl->getType() && decl->getType()->checkType( Type::DT_FUNCTION_PROTOTYPE ) && decl->getCorrespondingScope() != NULL ) { enterScope( decl->getCorrespondingScope() ); return decl; }If that is not the case, then a new function type and Prototype scope should be created and the scope created should be entered. First, the return type of the function will be the current type of the NameDeclaration object passed in. That type will be used to create a FunctionType object with the Q_PROTOTYPE enum member. The newly created FunctionType object will be assigned to the NameDeclaration object's type member. Next, a new prototype scope will be created with the NameDeclaration object as the declaration for the prototype and the current scope as the containing scope. Then, the ContextManager will search for a template scope and assign it to the PrototypeScope's TemplateParameterScope member variable. Finally, the newly created scope will be pushed on the list of scopes and on the stack of scopes.
2.6. moveIntoFunctionScope This function will attempt to move into a function's scope. The first thing to check for is whether or not you are in a member intializer list. If so, the function scope has already been entered and this is a duplicate call. Simply set the mem_init variable to false, and continue onward. Second, if the NameDeclaration object passedin already has FunctionType, then it is a Q_PROTOTYPE. Change it to be a Q_FUNCTION. The prototype scope will be the NameDeclaration's corresponding scope. Next, it creates a function scope and gets any previous template parameter scope for the function prototype scope's use. Finally, it pushes the function scope on the list and stack.
2.7. moveIntoClassScope This function attempts to move into class scope. First, the current template scope is retrieved. Second, an explicit specialization is looked for. If the type is a class and it has a corresponding class scope, and that scope has template parameters, then we are entering an explicit specialization. That means that the Corresponding Scope should be pushed on both the list and stack and the function should terminate. Otherwise, we insert the NameDeclaration into the current scope with a newly created ClassType object. We also create a ClassScope and set the TemplateParameterScope accordingly. Finally, we push the scope on the back of the list and stack.
2.8. moveIntoLocalScope This function creates a new LocalScope object with the current scope as the containing scope. It then loops back through the containing scopes until it finds a function. The local scope is then added to the function's list of local scopes. Finally, the newly created scope is added to the list and stack.
2.9. moveIntoTemplateParameterScope This just creates a new TemplateParameterScope and puts it on the list and stack.
2.10. moveIntoNamespaceScope This function first checks to see if the NameDeclaration passed in already has a corresponding scope. If not, it creates a new scope, inserts the declaration into the current scope, and pushes the scope on the list. Then, it takes whichever scope should be used and pushes it on the stack of scopes.
2.11. moveIntoAnonNamespace This function gets the current scope and ensures it is a namespace scope (since anon namespaces may only occur in other namespaces). Second, it moves into the namespace scope. Finally, it adds the anonymous scope to the list of uses declarations of the containing scope.
2.12. installBaseInCurrentScope This function ensures that the current scope is a class scope. It then gets the corresponding scope of the base class' NameDeclaration object (which is passed in). If the scope was found, then it is given to the class scope to add. Otherwise, if the base class was a template parameter or a template template parameter, nothing is done.
2.13. useNamespace This function merely ensures that the current scope is either a local scope or a namespace scope. Then, it adds the NameDeclaration for the namespace to the uses list of the current scope.
2.14. undoRecentInsertLocal If the NameDeclaration is qualified, it calls undoRecentInsertLocal on the qualified scope. Otherwise it calls it on the current scope.
2.15. useDeclaration This function creates a new NameDeclaration based on the one passed in and inserts it into the local scope. It then sets the newly created NameDeclaration's instance to whatever the old instance was and also sets the isTemplate member in the same way.
2.16. getCurrentClassScope This function merely loops through containing scopes until it finds one that is a class scope.
2.17. getCurrentTemplateScope This function looks at the list of scopes and finds the first scope that is a TemplateParmaeterScope. If the function's argumetn is true, it erases the scope from the list, to prevent others from finding it. Otherwise, it leaves it alone.
2.18. insertLocal
This function choreographs the insertion of a name declaration
in a scope. First, it checks to see if the name is qualifed.
If it is, it will insert into the qualified declaration's scope.
Otherwise, it inserts into the local scope. Next, it checks
for duplicate class/namespace entries.
If the declaration is already in
the symbol table, and the containing scope is the scope
previously found, and the type passed in is the same as
the NameDeclaration's type and the decl is not an instance,
then it is a duplicate class/namespace declaration and is not
inserted again. Otherwise, if the type is a function prototype
type, and the containing scope is the scope previously found,
then nothing is done. If neither of the above hold, then
the NameDeclaration found by lookup was incorrect and a new
one must be created. To create a new one, it checks to see
if it will be a destructor, if so it reformats the name into
the X::~X
format. Then, it creates a new NameDeclaration,
sets its occur qualifier (if any), and inserts it in the scope.
The ActionFacade is a thin wrapper around the LookupController and ContextManager. It also adds a few utility functions for the parser actions to use. The wrapper functions will not be discussed here, as they merely forward the function call to the appropriate object.
3.1. makeAnonIdent This function is called to create an anonymous identifier for things such as cast operators . The anonymous identifier created is in the form __anonX where X is a unique number.
3.2. makeAnonParmIdent This function is called to create an anonymous identifier for unnamed function parameters. The anonymous identifier created is in the form __anonX where X is the position of the parameter.
3.3 makeAnonNSIdent This function is called to create an anonymous identifier for unnamed namespaces. This funtion always returns "anon".
There are six classes used to help validate the Scope heirarchy, Type heirarchy, and NameDeclaration class. Each one of the three mentioned has a corresponding InvariantVisitor and InvariantFacilitator. The InvariantFacilitator stores references to objects that it should validate. The InvariantVisitor loops over those stored references and makes assertions about the objects stored. Since the objects are monotonic in nature, they are only validated at the end of the program. The actual assertions used are listed in a subsequent section.
There is a TypeInvariantFaciliator, ScopeInvariantFacilitator, and NameDeclarationInvariantFacilitator. They all follow the same pattern. Each contains a static vector of pointers to objects of that type.
1.1 CheckInvariants Checks the invariants on this object. This uses a form of the acyclic visitor. A dynamic_cast is performed for each class in the heirarchy (most derived first) until the right class is found, then it is sent to the visitor.
1.2 CheckStoredClassesInvariants Checks the invariants on all objects stored by calling CheckInvariants on each object in the array.
The abstract syntax graph was designed to be similar in nature to the one described by the Bell Canada Datrix proposal. The classes from which it is composed are created using the treecc sytem. The operations that act on those classes are also specified using treecc.
The classes and their purposes are listed below:
int x,y,z;
.new
expressiondelete
expression?:
operatorEach class is connected to other classes by what can be thought of as edges on a graph. In many systems, these edges are represented by actual classes; in our system, the edges are simply pointers to the connected classes.
The connections are as follows:int x,y,z
will create
an object list with three members; one for each x, y, and z.extern
blockCertain operations are defined on these heirarchies. Some operations are defined here. Those that are not defined here are trivial to understand and can be found in the ASG/asgnds.tc file.
getType(cIdentifier)
If the cIdentifier is derived from Type, it returns itself.
If the cIdentifeir is a cObject or cFunction, it returns the cType
member.
If the cIdentifier is a cConstTemplateParameter, it returns the
cType member of its cFormalFctParam member.
Otherwise, it returns NULL
setType(cIdentifier)
If the cIdentifier is an cObject, it sets the cType member.
If the cIdentifier is a cConstTemplateParm, it sets the cType
member of the cFormalFctParam.
If the cIdentifier is a cFunction, it sets the return
type of the function.
setTemplateParms(cType) If the cType is a cFctType, cForwardType, or cAggrType, the template parameters are set to the list passed in.
Each class also defines a virtual accept method that calls the visit method in the ASGVisitor class passed in. This is used to add algorithms to the heirarchy without having to modify the treecc file.
As with all software projects, keystone has had its share of difficulties. In this section, we will outline some of the major difficulties that plague the keystone source code. Much of the ugliness that is present in the source is due to these issues.
The fundamental problem with Token Decorated Parsing is the inability of the parser to execute certain actions without lookahead. This lookahead forces Token Decoration to be performed too early, at times when the correct scope has not been entered. The classical example of this is
class X { typedef int AT; void f( AT arg ); }; void X::f( AT arg ) {}This example poses serious difficulties for Token Decorated parsing. The parser usually needs to see that
AT
is a type name
before it decides that the construct is a function definition.
However, prototype scope has not yet been entered, since prototype
scope is only entered once the construct has been identified.
Therefore, AT is looked up in global scope and not found.
Another example of the same problem is
templateIn this case, the opening bracket ({) must be seen before class scope is entered; however, T must be looked up in the class scope's template scope. Once again, the lookahead has caused a problem with token decoration.class X : public T {};
A second problem with parsing C++ is remembering the qualified
context in the face of templates. In a non templated language,
remembering the qualified context is trivial. For example,
A::B::C
can be used to illustrate the fact that the
qualified context for B is A, and the qualified context of
C is the B and its qualified context. Therefore, you only
need two lookbehind tokens to store the qualified context.
In C++, an additional ~ may be interspersed between the ::
and the identifier for destructors (e.g. X::~X) giving rise
to the need for three lookbehind tokens. However, with
templates, and the template syntax, an unknown number
of lookbehind tokens are needed. For example,
templateFrom this example it can be seen that determining qualified context is not a trivial matter.class X{ template class A { struct B {}; }; }; X<17,char>::A ::A>::B
The grammar provided by the ISO C++ committee mandates a number
of decorated tokens such as ID_class_name and ID_typedef_name.
One might thing that each template function or class should
be decorated as an ID_template_name. However, the grammar
only wants an ID_template_name when it is used in a template_id
expression: ID_template_name LESS template_argument_list GREATER
.
Therefore, a lookahead must be done to see if the next token is LESS.
If so, the current token may be decorated as an ID_template_name, otherwise
it should receieve some other decoration.
Secondly, there may be some dispute over what a template parameter
should be decorated as. The standard does not identifier a
ID_template_parameter_name. It would seem logical to decorate it
as an ID_typedef_name; however, this fails to resolve to a
class_or_namespace_name which the template parameter might be. Therefore,
ID_class_name was chosen as the decorated token for template parameters.
There are a number of ambiguities in the C++ grammar which cannot be solved by token decoration without modification of the grammar. One in particular is the difference between a declaration and an initializer.
X (f);has two possible derivations. The first (and correct) one is a declaration of f with type int. The second one is a declaration of variable X with implicit type int and initializer f.
One grammar issue was uncovered during the creation of keystone that required modification of the grammar. An elaborated_type_specifier and a postfix_expression both had derivations that should have allowed for template_ids but did not.
This section will outline some of the solutions for the problems presented in the previous section.
As we discussed in the previous section, lookahead is the bane of token decorated parsing. However, there are usually ways to find a "valid" point in the parse, stop the trial parse, and execute all of the actions up to that point thereby resetting lookahead back to zero. In BTYACC, this command is called YYVALID. A YYVALID call is placed after all left and right brackets ({}) as well as all semicolons (;). Furthermore, it is used to solve the function problem above. If the left paren has been seen, and the previous identifier was a known function, then we call YYVALID which will execute the previous actions and put us in prototype scope. Then, name lookup and token decoration can proceed unfettered.
To solve the problem of qualified name lookup, we use a stack of NameDeclaration objects. Everytime an ID_template_name is decorated, we save the identifier on a stack. When a template scope is exited, we pop the identifier off the stack and associate it with the GREATER token that ended the scope.
This section will examine the assertions and invariants that are present throughout the keystone system. Furthermore, it will look at invariants that should be true but are not for various reasons
1. Assertions Various assertions are made throughout the keystone system using the MAKE_ASSERTION macro. MAKE_ASSERTION creates an object of type Assertion which will throw itself if its condition is violated. The test program will catch these assertions and print them to the screen if one is thrown.
file | function | assertion |
---|---|---|
ActionFacade.cpp | lookupUnqualifiedName | The input NameOccurrence object cannot be NULL |
ActionFacade.cpp | lookupQualifiedName | The input NameOccurrence object cannot be NULL |
ActionFacade.cpp | lookupPseudoDestructor | The input NameDeclaration objects cannot be NULL |
ActionFacade.cpp | useDeclaration | The input NameDeclaration object cannot be NULL |
ActionFacade.cpp | useDeclaration | The input NameDeclaration object must be in the symbol table |
ActionFacade.cpp | useNamespace | The input NameDeclaration object cannot be NULL |
ActionFacade.cpp | useNamespace | The input NameDeclaration object's type must be namespace type |
ActionFacade.cpp | insertLocal | The input NameDeclaration object cannot be NULL |
ActionFacade.cpp | moveIntoClassScope | The input NameDeclaration object cannot be NULL |
ActionFacade.cpp | moveIntoNamespaceScope | The input NameDeclaration object cannot be NULL |
ActionFacade.cpp | moveIntoAnonNamespace | The input NameDeclaration object cannot be NULL |
ActionFacade.cpp | moveIntoPrototypeScope | The input NameDeclaration object cannot be NULL |
ActionFacade.cpp | moveIntoFunctionScope | The input NameDeclaration object cannot be NULL |
ActionFacade.cpp | moveIntoFunctionScope | The input NameDeclaration object must be in the symbol table |
ActionFacade.cpp | moveIntoFunctionScope | The input NameDeclaration object must have function prototype type |
ActionFacade.cpp | installBaseClass | The base class NameDeclaration object cannot be NULL. |
ActionFacade.cpp | installBaseClass | The base class must be of Class type, template parameter type, or template template parameter type. |
ActionFacade.cpp | makeFriend | The current scope must not be NULL |
ActionFacade.cpp | makeFriend | The input NameDeclaration object must have a type |
ActionFacade.cpp | makeFriend | The containing scope must be a ClassScope |
ActionFacade.cpp | makeTypedef | The input NameDeclaration object must not be NULL |
ActionFacade.cpp | makeInstance | The input NameDeclaration object must not be NULL |
ArgumentStack.cpp | endArgumentList | The argument list cannot be empty |
ArgumentStack.cpp | getArgumentList | The argument list cannot be empty |
ContextManager.cpp | getCurrentClassScope | The nestedScopes stack cannot be empty |
ContextManager.cpp | installBaseInCurrentScope | The current scope must be a class scope |
ContextManager.cpp | useNamespace | The input NameDeclaration object's corresponding scope must be a namespace scope |
ContextManager.cpp | moveIntoAnonNamespace | The current scope must be namespace scope |
ContextManager.cpp | insertLocal | The nested scopes stack cannot be empty |
ContextManager.cpp | useDeclaration | The input NameDeclaration object's type cannot be NULL |
DeclarationStack.cpp | top | The stack cannot be empty |
DeclarationStack.cpp | push | The input NameDeclaration object cannot be NULL |
DeclarationStack.cpp | push | The input Type object cannot be NULL |
LookupController.cpp | lookupQualifiedName | The input NameOccurrence object cannot have a NULL qualifier. |
LookupController.cpp | lookupQualifiedName | The input NameOccurrence object's qualifier's corresponding scope cannot be NULL. |
LookupController.cpp | lookupQualifiedName | The input NameOccurrence object's qualifier's corresponding scope cannot be NULL. |
LookupController.cpp | lookupPseudoDestructor | The input NameDeclaration objects cannot be NULL. |
TokenBuffer.cpp | getNameOccurrence | An IDENTIFIER preceeding a COLCOL must have a valid NameDeclaration object attached to it. |
TokenDecorator.cpp | getCSIdent | The input NameDeclaration object must not be NULL |
Type.cpp | FunctionType::FunctionType | The input NameDeclaration object must not be NULL |
Type.cpp | FunctionType::print | The NameDeclaration for the function must not be NULL |
Type.cpp | IndirectType::print | The indirection type must be IT_ARRAY, IT_POINTER, or IT_REFERENCE |
actions.cpp | condition | The identifier passed to condition must be of type cObject |
actions.cpp | direct_declarator_arr | The identifier passed to condition must be of type cObject |
NameDeclaration.cpp | addReference | The TokenPosn passed in cannot be NULL |
NameDeclaration.cpp | takeReference | The reference list size cannot be NULL |
Scope.cpp | Scope::insertLocal | The input NameDeclaration object cannot be NULL |
Scope.cpp | Scope::undoRecentInsertLocal | The list of local NameDeclarations cannot be empty |
Scope.cpp | Scope::undoRecentInsertLocal | The list of local NameDeclarations cannot be empty |
Scope.cpp | Scope::undoRecentInsertLocal | The back of the locals list must be the same as the input NameDeclaration |
Scope.cpp | Scope::lookup | The input NameOccurrence object cannot be qualified |
Scope.cpp | FunctionScope::lookup | The input NameOccurrence object cannot be qualified |
Scope.cpp | LocalScope::lookup | The input NameOccurrence object cannot be qualified |
Scope.cpp | TemplateParameterScope::lookup | The input NameOccurrence object cannot be qualified |
2. Invariants Invariants are executed at the end of the program and affect classes in the Scope and Type heirarchy, as well as, the NameDeclaration class.
Class | Invariant |
---|---|
Scope | The NameDeclaration object corresponding to this scope implies that the corresponding scope for the NameDeclaration is the same as the scope under scrutiny. |
Scope | Either the containing scope is not NULL, or the scope's name is "_GlobalNamespace". |
Scope | All of the local objects should have a containing scope that is equal to the scope under scrutiny. |
FunctionScope | The prototype scope's name should be the same as the FunctionScope under scrutiny's name. |
FunctionScope | The containing scope should be a class or namespace scope. |
PrototypeScope | None of the local declarations may be of NamespaceType. |
PrototypeScope | The containing scope must be a NamespaceScope, ClassScope, PrototypeScope, or LocalScope. |
LocalScope | The NameDeclaration object must be NULL. |
LocalScope | None of the local declarations may be of NamespaceType. |
LocalScope | The containing scope must be LocalScope or FunctionScope. |
NamespaceScope | The containing scope must be a NamespaceScope. |
ClassScope | None of the local declarations may be of NamespaceType. |
ClassScope | The containig scope must be a NamespaceScope, ClassScope, or LocalScope |
NameDeclaration | The containing scope not NULL implies that the corresponding scope is not equal to the containing scope. |
NameDeclaration | The corresponding scope cannot be a LocalScope. |
NameDeclaration | If the type is ClassType and it is an instance, then the corresponding scope should be NULL. |
NameDeclaration | If the type is NamespaceType, then the corresponding scope should be a NamespaceScope. |
NameDeclaration | If the type is FunctionType, then the corresponding scope should be FunctionScope or a PrototypeScope. |
NameDeclaration | If the type is DT_FUNCTION_PROTOTYPE, then the corresponding scope should be ProtypeScope or NULL |
NameDeclaration | If the type is DT_FUNCTION_DEFINITION, then the corresponding scope should be FunctionScope or NULL |
NameDeclaration | If the virtual attribute is set, the containing scope should be set and they type should be FunctionType. |
NameDeclaration | If the pure attribute is set, then the virtual attribute should be set. |
NameDeclaration | If the Friend attribute is set, then the type should be function or class type. |
NameDeclaration | If the Static attribute is set, then the type should not be NamespaceType. |
NameDeclaration | If the const attribute is set, then it must be an instance or have FunctionType. |
BasicType | The NameDeclaration of this type must be NULL. |
FunctionType | The NameDeclaration of this type must not be NULL. |
FunctionType | The Type of NameDeclaration of this type must not be NULL. |
FunctionType | The return type of this type must not be NULL. |
FunctionType | The corresponding NameDeclaration's type cannot be NamespaceType. |
FunctionType | The return type cannot have NamespaceType. |
IndirectType | The reference type must not be NULL. |
ClassType | The NameDeclaration object's type must be of ClassType. |
EnumType | The NameDeclaration object's type must be of EnumType. |
NamespaceType | The NameDeclaration object's type must be of NamespaceType. |
3. Problems The current problem with the invariant system is that it cannot be extended to the traditional class invariant system because most of the invariants only become true after a certain period of time and it would be cumbersome to install a modality into them. Therefore, new ways of checking and denoting invariants are being investigated.
create_visitor.pl
. It will prompt for a visitor
name. It will then create a .cpp and .h file corresponding to the name you entered.
The .h and .cpp files will contain a skeleton visitor that can be filled in
by the user. The skeleton file is autogenerated from the Visitor.h file so
that any modifications to the base class are automatically incorporated into
the script.