Keystone: A Parser Front-End for the C++ Language

Table of Contents

  1. Usage
    1. Making the keystone system
      1. Building BTYACC
      2. Building Treecc
      3. Building Keystone
    2. Running keystone
      1. Reading from STDIN
      2. Reading from a file
      3. Interpreting the output
      4. Errors and bug reporting
      5. Known Issues
    3. Auxilliary scripts
      1. Gathering test statistics
      2. Gathering test information
      3. Preprocessing input
  2. Design and implementation
    1. System level overview
    2. Choreographed classes
      1. NameOccurrence
      2. TokenBuffer
      3. TokenDecorator
      4. NameDeclaration
      5. The Scope Heirarchy
      6. The Type Heirarchy
    3. Choreographing classes
      1. LookupController
      2. ContextManager
      3. ActionFacade
    4. Validation classes
      1. InvariantFacilitator
      2. InvariantVisitor
    5. The abstract syntax graph (ASG)
      1. A list of classes
      2. Connections
      3. UML Diagram
      4. Operations
      5. ASGVisitor
  3. Implementation issues
    1. Difficulites encountered
      1. The fundamental problem
      2. Remembering qualifed context
      3. ID_template_name issues
      4. Precedence issues
      5. Grammar issues
    2. Solutions
      1. Curbing lookahead
      2. A stack of NameDeclarations
    3. Validation
      1. Assertions
      2. Invariants
      3. Problems
    4. ASG
      1. Treecc version
      2. Creating a visitor
  4. Miscellaneous notes
    1. Upcoming features

Usage


A. Making the keystone system


1. Building BTYACC

BTYACC is a backtracking, predicated parser designed to facilitate the parsing of ambiguous languages such as C++. It is available from Siber Systems and works on most *nix platforms (including CYGWIN) and WIN32. keystone currently uses BTYACC version 3.0 which can be built using the following steps.
make

2. Building Treecc

Treecc is a syntax tree generator designed to allow safe and easy manipulation of syntax trees. It is available from Souther Storm Software, Pty Ltd and works on most any platform. keystone currently uses Treecc version 0.1.4 which can be built using the following steps.

./configure
           make all
           make check
           make install
      

3. Building keystone

The final product of the keystone build should be a shared library which contains the necessary data and functions to create an ASG and symbol table for a program written in the ISO C++ standard. Additionally, a test program is also created which uses the routines to generate output about an input file. Both the test program and the library will build during the make.

make home
The first step in making keystone is to set up keystone's makeinclude file. This is accomplished by typing make home from within the build directory.

make
keystone can then be made by typing make at the command line. Optionally, an argument can be given to make in the form of CONFIG=arg. This argument will be passed to all subsequent g++ calls. Three common arguments to config are -DYYDEBUG, which will turn on BTYACC's internal debug statemtns; -D_PARSE_DEBUG, which will print upon entry into each of the functions corresponding to a grammar rule; and -D_THG_DEBUG, which will print upon entry to a majority of the functions that make up the keystone system.

make test
This command runs the perlscripts/gather_stats.pl perl script on the testsuite directory. A report file is generated with the results. A list of currently failing tests will appear later in this section.

make check
This command will check each file in the directory against those listed in the MANIFEST file. If the file is not listed in the MANIFEST file, it will print it out. These files are extraneous and can be deleted. It will also issue an error if a file is listed in the MANIFEST but does not appear in the distribution. This is a problem in the distribution and should be reported as a bug (see section on reporting bugs below).

make distclean
This command will eliminate all extraneous files as reported by make check.


B. Running keystone


1. Reading from STDIN

By default keystone reads from STDIN. Simply type run on the command line for a prompt. After typing in the code snippet you wish to run, type CTRL-D to have keystone process it. If the parse was successful, you should get normal output; otherwise, you will get an error message.

2. Reading from a file

If run is passed an argument, that argument is assumed to be the name of a file to run. The file is NOT preprocessed, nor changed in any way before being analyzed by the keystone system. The output will be the same as if read from STDIN.

3. Interpreting the output

If the parse was successful, then keystone will print three output sections. The first section will consist of all the tokens that underwent name lookup and the results of that lookup. The second section is a listing of all of the scopes in the system and their contents. The third section contains statistical information and the last section is a text representation of the ASG.

3.1. Name Lookup Output

A typical statement that occurs from Name Lookup appears below.

UNL: did not find decl: use of T (2,6) ElabClass
The first three letters detail what kind of lookup was performed. The options are
UNL
An unqualified name lookup. This occurs on all identifiers not covered by later items.
QNL
A qualilfied name lookup. This occurs on identifiers of the form X::Y. It forces Y to be looked up in X's scope.
MNL
A member name lookup. This occurs on identifiers of the form x->y. It forces y to be looked up in the scope of x. Currently not used.
DNL
A destructor name lookup. This occurs on identifiers of the form x->~X() or x->X::~X(). If a destructor is not found, a trivial one is created for the class.
FCL
A function call lookup. Currently not used.
BOP
A binary operator lookup. Currently not used.
UOP
A unary operator lookup. Currently not used.
RQN
Performs a second lookup of a qualified name. The name may not have been seen at the time of the initial lookup; this signifies a retry. Currently not used.
RUN
Performs a second lookup of an unqualified name. The name may not have been seen at the time of the initial lookup; this signifies a retry. Currently not used.
The second section of the Name Lookup Output tells whether or not the lookup was successful. In the previous example, the lookup was not successful. If the lookup were successful, it would detail the declaration which was found.
The final section of the Name Lookup Output describes what was being searched for. It is the output of a NameOccurrence object.

3.2. Symbol Table Output

The symbol table output describes each one of the symbol tables that were created during the program parse. Each symbol table describes its contained in scope as well as the members that are declared inside it. An example symbol table output is shown below.

        NAMESPACE: _GlobalNamespace (0,0): namespace _GlobalNamespace (no refs)
        	i (1,2) instance of: int (no refs)
        	C (2,2): class C (no refs)
        END OF NAMESPACE SCOPE: _GlobalNamespace
        CLASS: C (2,2): class C (no refs) in _GlobalNamespace
         	j (2,7) instance of: int Public (no refs)
        END OF CLASS SCOPE: C
      
_GlobalNamespace is the namespace that represents file scope. The (0,0) represents the starting line and token number, respectively. After the second :, the NameDeclaration object is printed followed by any obvious references to that declaration. On the next line, and indented is each NameDeclaration object that was declared inside the scope. In this case, i was an integer variable(instance) that was never used (no refs) and C was a class type. A similar scope printout for C's class scope appears next.

3.3 Statistics

The statistics printed consists of the following information.

Decorated Identifiers
The number of tokens which were decorated to be ID_typedef_name, ID_class_name, ID_enum_name, ID_original_namespace_name, ID_namespace_alias, or ID_template_name.
Non Decorated Identifiers
The number of tokens that were identifiers and were NOT decorated.
Non Identifier Tokens
The number of tokens that were not identifiers (example: +,::,--,.,if,else)
Specifiers set by buffer
The number of specifiers set by the token buffer. Specifiers are discussed in the section on the NameDeclaration class.
Qualifiers set by buffer
The number of qualifiers set by the token buffer. A qualifier in the case of A::B::C::D would be A, B, and C. Qualifiers are discussed in the section on the NameDeclaration class.
Type Invariants
The number of objects of the Type heirarchy that had their invariants checked. More on this in the section on invariants.
Scope Invariants
The number of objects of the Scope heirarchy that had their invariants checked. More on this in the section on invariants.
NameDeclaration Invariants
The number of NameDeclaration objects which had their invariants checked. More on this in the section on invariants.

3.4. The ASG output

The ASG is printed using the ASGPlainTextVisitor class found in ASG/visitors/ASGPlainTextVisitor.{h,cpp}. It is described in detail in the section on the ASG implementation.

4. Errors and bug reporting

An error can take one of three separate forms. It can be an internal assertion error; an invariant validation error, or a parser error.

4.1. Internal assertion errors
Internal assertion errors are caused by invalid preconditions or postconditions from various functions throughout the system. For a complete list, please see the section on Assertions. The error message will look like:

Failed assertion: cannot add NULL reference.
              in file: NameDeclaration.cpp
              on line: 140 
If you receive this error message, please submit a bug report.

4.2. Invariant validation errors
Invariant validation errors occur at the end of the parse, but before the ASG is printed. These errors indicate a breakage of a class invariant. These errors are implemented as C assertions and as such have system dependent output. Normally, they will cause a core dump. If one of these errors occurs, please submit a bug report.

4.3. Parser errors
Parser errors are the least informational and simply say Error parsing!. You should check your code against a separate compiler( such as GCC) to ensure that it is standard conformant. If it appears to be so, submit a bug report.

4.4. Submitting a bug report
To submit a bug report, please send an email message to Tanton Gibbs or Brian Malloy consisting of your environment and the smallest possible program which can reproduce the undesired behavior.

5. Known Issues

There are a number of outstanding issues.

  1. keystone cannot handle non-standard compliant code. Unfortunately, the GCC header files are full of non-compliant code. This means that the GCC header files cannot be used in conjunction with keystone. In many cases, this is not an issue. For those cases where it is an issue, we are deriving a solution. Those cases include programs which use any type defined in a standard header (e.g. size_t, list, type_info).
  2. keystone cannot handle template member functions of a template class declared outside of the class. This is being researched and we anticipate a fix soon
  3. keystone cannot handle function arguments which take a fully qualified type as the first argument and the fully qualified type's first qualifer is a template_id. This should be resolved by the next version.

C. Auxillary scripts


There are four auxillary perl scripts which come bundled with keystone. While they do not directly affect keystone they simplify some of the common tasks done when testing keystone. Three of the scripts will be discussed here, the fourth will be discussed in conjunction with the ASG.

1. gather_stats.pl

This script runs keystone on a directory of files and ensures that the output of each file is error free. It requires GCC and bash or tcsh. It summarizes the results of its run in a file called report.out.
Its usage is: perl gather_stats.pl directory
The report.out file looks like

CORED:

FAILED:
../testsuite/regression/ddjstuff/clause14/14.5.2-2.cpp
../testsuite/regression/ddjstuff/clause14/14.8.2.4-14b.cpp
../testsuite/regression/ddjstuff/clause14/14.8.2.4-18.cpp
Total Succeeded: 428
MIN Decorated: 0
MAX Decorated: 25
TOT Decorated: 1626
AVG Decorated: 3.79906542056075
MIN NonDecorated: 1
MAX NonDecorated: 35
TOT NonDecorated: 3683
AVG NonDecorated: 8.60514018691589
MIN Non Ident: 4
MAX Non Ident: 131
TOT Non Ident: 16451
AVG Non Ident: 38.4369158878505
MIN Specifiers set: 0
MAX Specifiers set: 25
TOT Specifiers set: 1765
AVG Specifiers set: 4.12383177570093
MIN Qualifiers set: 0
MAX Qualifiers set: 9
TOT Qualifiers set: 235
AVG Qualifiers set: 0.549065420560748
The CORED section contains all programs that caused a core dump.
The FAILED section contains all programs that caused an assertion error or a parser error. The rest of the file contains a summary of the statistics generated by keystone.
AVG -> average, MIN -> minimum, MAX -> maximum, TOT -> total

2. gather_info.pl

This script will analyze a C++ file and give a report on the number of non-empty lines, POD classes, and classes with functions that exist in the file. It prints the report to STDOUT.
Usage: gather_info.pl filename

3. preproc.pl

This file is used by the gather_stats.pl file to eliminate any system headers as well as any lines that contain cout or <<. The later is actually not necessary, but is in for historical reasons. It prints the preprocessed file to STDOUT. Usage: preproc.pl filename


Design and implementation


A. System level overview


The keystone system is comprised of two major subsystems: the Program Processor and the Symbol Table.

Program Processor


The Program Processor subsystem includes scanning and parsing and is responsible for initiating and directing symbol table construction and name lookup. This responsibility includes two phases:

  1. assembling the necessary information for creation of a NameOccurrence object, and
  2. directing the search for a corresponding NameDeclaration object in the Symbol Table subsystem

The Program Processor subsystem includes the following classes:
NameOccurrence
An identifier found in the source code.
Type
The type of the NameDeclaration. This really belongs in the Symbol Table subsystem, and will eventually be migrated.
ContextManager
A class to move in and out of scopes
LookupController
The class responsible for directing name lookup
TokenBuffer
The class which interfaces with lex and stores NameOccurrences
TokenDecorator
The class responsible for decorating a token depending on context and type
ActionFacade
This class is an interface between the grammar actions and the other classes
KeywordManager
A class used to identify identifiers that are keywords
While traditonal parsing looks like this:
XXX
Token Decorated parsing introduces an intermezzo oval and looks like this:
XXX
The interaction between the scanner (lex) and the parser (BTYACC) is handled by the TokenDecorator and TokenBuffer classes. The parser requests a token from the TokenDecorator, who in turn requests one for the TokenBuffer. The TokenBuffer maintains an array of 5 tokens representing the current token, the two previous tokens, and the two next tokens. [Note: In the implementation section, we will discuss all the situations in which the left/right contexts are used.] If necessary, the TokenBuffer will request a token from the scanner. If the token is an identifier, the TokenBuffer will create a NameOccurrence object representing the identifier. It will also provide certain specifiers for the object representing the context it was used in. The currently used OccurSpecifiers are:
Destructor
This identifier was used to declare a destructor
ElabEnum
This identifier was preceeded by the keyword enum
ElabClass
This identifier was preceeded by the keyword class
Member
This identifier was used in a member access expression such as x.a
Namespace
This identifier was preceeded by the keyword namespace
PseudoDestructor
This identifier was used to explicitly call a destructor ( i.e., x.~X() )
Qualifier
This identifier is a qualifier for some other identifier( i.e., A in A::b )
nothing
This identifier has no specifier
Furthermore, the TokenBuffer will look for qualified members and will set the qualifier of an identifier to the appropriate object. For example, the expression A::b will produce two name occurence objects, one for A and one for b. The specifier for A will be Qualifier. Furthermore, b's NameOccurrence object will have a pointer to A's NameOccurrence object as its qualifier.

The TokenDecorator receives the NameOccurrence object from the TokenBuffer and uses the ActionFacade to perform name lookup. The TokenDecorator will either receive a NameDeclaration object for the NameOccurrence representing where it was first declared, or it will recieve a NULL pointer. If a NULL pointer is received, then no further processing is done and the undecorated token (IDENTIFEIR) is passed on to the parser. However, if a NameDeclaration object is found, a further series of tests is performed to see if the token should be decorated.

The LookupController is responsible for directing name lookup and interacts with the ContextManager. If the lookup is for a qualified name, then the scope of the qualifier ONLY is examined. If, instead, the lookup is for an unqualified name, then the current scope is searched. The LookupController is also responsible for finding the current class scope (for looking up this->x).

The ContextManager is responsible for moving into and out of various Scopes. The scopes that can be entered and left in keystone are: The KeywordManager contains a list of keywords and the tokens they map to. The scanner, upon encountering an IDENTIFIER, passes it to the KeywordManager for modification. If the KeywordManager finds it in its list, then it returns the appropriate token; otherwise, it returns back IDENTIFIER.

The Type heirarchy will be defined with the Symbol Table.

2. Symbol Table

The Symbol Table subsystem is designed to allow name lookup in keystone as described in Clause 3 of the ISO C++ standard. It consists of the NameDeclaration class and the Scope heirarchy, which consists of the scopes listed in the previous section. Each Scope contains pointers to those NameDeclarations which were declared inside it as well as a pointer to the containing scope. Furhtermore, both ClassScope and PrototypeScope contain a pointer to the TemplateParameterScope (if any). NameDeclaration objects represent the declaration of an identifier or type. They contain a pointer to thier enclosing scope as well as their corresponding scope. For example, in the following code snippet:

        int f() {
          class C{};
        }
      
The NameDeclaration object for C would contain a pointer to the enclosing local scope as well as the corresponding class scope.
A NameDeclaration contains whether or not it has been put into the symbol table, whether it is an alias for some other type, whether it is a template, and its qualifier, if any.
A NameDeclaration also contains a pointer to its type as well as a flag saying whether it is an instance of that type or not. In the previous snippet, C has class type and is not an instance. Had there been a variable of type C declared, then the variable would have had class type and would have been an instance. The Type heirarchy represents the various types that a NameDeclaration can be declared as. It consists of the following classes: Most types also contain a pointer back to their name declaration object. Therefore, a circular structure is formed between a NameDeclaration like C above, and its Type (ClassType in this case). However, Typedefs are represented in a slightly different way. The NameDeclaration representing a typedef has a pointer to a Type object, but the Type object it points to contains a different NameDeclaration object. As an example, consider the following:
      class C {};
      typedef C MyC;
      
In this example, C will have a ClassType data member. That member will also have a NameDeclaration that points back to C. MyC will also have a ClassType data member. However, that ClassType will point to the NameDeclaration for C, not MyC.
A BasicType does not contain NameDeclaration objects. Therefore, if a NameDeclaration has type BasicType and is not an instance, it must be a typedef.
A FunctionType also contains a Type for its ReturnType. An IndirectType contains the type pointed to.
TemplateParameterType and TemplateTemplateParameterType are only used for template parameter types which are declared like class T or typename T. Template parameters with specified type are represented by one of the previously mentioned types.

B. Choreographed classes


In this section, we will examine the classes responsible for the bulk of work in keystone. It will be a low level overview with numerous code samples with explanations.

1. NameOccurrence

The NameOccurrence class represents an IDENTIFIER seen in the input stream. It can be adorned with a specifier (covered previously) and it can be qualified (also covered previously). Furthermore, it can be marked as a template occurrence. A NameOccurrence is marked as being a template if it is preceeded by the template keyword. This allows the NameOccurrence to be correctly identified as an ID_template_name by the TokenDecorator. For example, in the following code snippet:

      class X { template<int i> void f(); };
      X::template f<100>();
    
f would be marked as a template occurrence.


2. TokenBuffer

The TokenBuffer is responsible for gathering the information from the scanner and passing it on to the parser. It also saves various information about the surrounding context for later use by the TokenDecorator and the parser. In the following section we describe some of the important member functions and how they are used.

Data Structures The Token Buffer contains an array of MAX_BUF entries (currently 5). These entries contain the token, a std::string with the lexeme, a YYSTYPE structure which will contain a TokenInfo pointer, and a NameDeclaration pointer which will point to the last Qualifier in a nested name specifier with a template name (e.g. X<7,13>::Y< Z<int>::Q >::p [The first > contains a pointer to X's NameDeclaration object, the second > points to Z's, the third points to Y's] ).

2.1 init This function is called by the constructor. It reads MAX_BUF tokens into the buffer array.

2.2 getNextToken This function effectively looks at the next token. To accomplish this, it will do one of two things.

  1. It can increment the currentToken index. This only occurs at the beginning of the parse, when the currentToken index is less than MAX_BUF/2, or at the end of the parse (after EOF has been seen).
  2. It can move all of the tokens down one spot in the array (erasing the oldest). It can then read in the next token from the scanner.
If the currentToken is a DOT (.) or an ARROW (->), then the TokenBuffer notes that it is inMemberAccess.
If the currentToken is class or struct keyword and the next two tokens are IDENTIFIER and COLON, then it notes that it is inBaseSpecifierList.
If the currentToken is a LEFTBRACK ({), then it notes it is not inBaseSpecifierList.
If the currentToken is a LEFTPAREN ((), then it increments the parenLevel. This is necessary for deciphering the last identifier in a template expression such as X<(5>7)>. In the previous example, only the last > sign represents the end of the template parameter list.
If the currentToken is a RIGHTPAREN ()), then it decrements the parenLevel.
If the currentToken is a GREATER(>) and it is in a template argument list, and the parenLevel is zero, then it notes the last seen template id as the current qualified template id by calling the restoreId function.
Finally, the getMultiWordType function is called.

2.3. getMultiWordType This function looks at the current token and the next one or two tokens to determine if the type is a multi word type (e.g. long int, unsigned short int, unsigned char). The algorithm is fairly straight forward and can be seen in the code.

2.4. setOccurSpec This function is responsible for setting any occur specifiers on the NameOccurence object. It begins by checking the left context to see if the previous token was the enum keyword. If so, it sets the ElabEnum specifier.
Otherwise it looks for the class, union, or struct keywords and, if one is found, it sets the ElabClass specifier.
If not found, it looks for the namespace keyword and sets the Namespace specifier if found.
Otherwise, it looks for a preceeding ~ and sets the Destructor qualifier if found.
After checking the left context, the TokenBuffer checks the right context for a COLCOL (::). If found, the Qualfier specifier is set.
Finally, it checks to see if it is inMemberAccess and does not have a specifier. If so, it is specified as a Member. If that is not true, then it checks to see whether or not it is inBaseSpecifierList and it is not a qualifier. If so, then it sets the ElabClass specifier.

2.5. getNameOccurrence This function will turn the TokenInfo pointer contained in the buffer array into a NameOccurrence object. It creates a NameOccurrence object and attempts to set the occur qualifier through a number of steps. First, if the previous two tokens were ::~, then it uses the last identifier seen as the qualifier.
Otherwise, if the previous two tokens were IDENTIFIER ::, then it uses the NameDeclaration object for the IDENTIFIER token as the qualifier.
If that is not the case, then it checks to see if the previous two tokens were >::. If so, it looks at the > token's lastDecl member, and if non-NULL, uses it for a qualifier.
Otherwise, it will check to see if the previous two tokens were template::. If so, it will get the ::'s lastDecl member (unless, NULL then it uses the last identifier).
If all of the above fail, and the previous token is ::, then it is a global qualifier and the GlobalNamespace is retrieved from the ActionFacade and used as a qualifier.
If the previous token is template and the token before is a ::, ->, or ., then the NameOccurence is marked as being a template.
After performing the qualifications, this function calls setOccurSpec and returns the NameOccurrence object.

2.6. saveId This function is called by the TokenDecorator whenever a template argument list is entered. It sets the inTemplateArgList value to true and pushes the previous identifier onto a stack.

2.7. restoreId This function is called at the end of a template argument list. It checks to see if the current token is a GREATER (>) or a COLCOL (::) and if so, sets their lastDecl pointer to the NameDeclaration at the top of the stack, pops the stack, and sets the inTemplateArgList member to false. ** This is a bug.

2.8. setNameDeclaration This function is called once a NameDeclaration is found by the TokenDecorator. It sets the current Token's name declaration to be the one found by the TokenDecorator. It also sets the last_identifier member to the NameDeclaration pointer passed in. Furthermore, if the next token is a LEFTPAREN (() and the previous token was not template and the type of the declaration is FunctionType, then the lastWasF member variable is set to true. The lastWasF member variable is used to ensure that PrototypeScope is entered on time and correctly through BTYACC predicates (described later).

3. TokenDecorator

The TokenDecorator is responsible for decorating the token so that the parser can embed semantic information into its grammar. Furthermore, it provides an interface to many of the TokenBuffer's operations such as getLastIdentifier, lastWasFun, and getTokenCount. It is also responsible for saving and printing the statistics that appear in the output (described previously). Also, it uses the InvariantFaciliatators to check the classes invariants after printing the statistics.

3.1. getNextToken getNextToken is the function that directs the token decoration process. First, a token is retrieved from the TokenBuffer through use of the TokenBuffer::getNextToken function. Next, if the current token is an identifier, it undergoes token decoration. If it is not, it is passed on unchanged.
The token decoration process begins by the retrival of the NameOccurrence object from the TokenBuffer through use of the TokenBuffer::getNameOcurrence function. Next, the occurrence is translated into a NameDeclaration though the doLookup function. If the doLookup function returns NULL, then a new NameDeclaration object is created in the current scope. Furthermore, if the name occurrence was a template, then it is marked as a template. If the doLookup function does not return NULL, then the NameDeclaration returned has a reference added to it. However, in this case, it is not marked as a template. Instead, it is determined to be a template if it has not be determined already. Basically, there are two different ways to be a template, markTemplate says that you have not been declared, but are a template, setIsTemplate says you have been declared a template. Next, if the NameOccurrence object received from the TokenBuffer has a qualifier, then that qualifier is given to the NameDeclaration object to be its qualifier. Then, the NameDeclaration object that was created (or found) is handed back to the TokenBuffer. Finally, getCSIdent is called on the NameDeclaration object and the result of that is used as the current token and passed to the parser.

3.2. doLookup doLookup uses the action facade to make its initial lookup. If the NameOccurrence object to be looked up has a qualifier, then this function calls the ActionFacade's lookupQualifiedName. Otherwise, this function calls the ActionFacade's lookupUnqualifiedName. If a NameDeclaration was not found, then it could be because we are the first argument in a class method declared out of line. For instance:

           class X {
             public:
               typedef int T;
               void f( T );
           };
           void X::f( T ) {}
      
In this case, the parser will not have executed the rule to enter prototype scope before T is encountered in the lookahead (see section on problems with Token Decorated Parsing). However, T must be looked up in f's corresponding scope to be found. Therefore, the following code is used:
       if( decl == NULL && buf.lastWasFun() &&
           buf.getLastIdentifier()->getCorrespondingScope() ) {
         decl = buf.getLastIdentifier()->getCorrespondingScope()->lookup( id );
       }
      
** This is a bug and should go before the other lookups. Finally, whether found or not, the NameDeclaration object is returned.

3.3 getCSIdent This function takes a NameDeclaration object (and whether it was preceeded by template), and returns back a potentially decorated token.
If the NameDeclaration object does not have a type object, and it was not preceeded by template, then it returns the undecorated IDENTIFIER token.
If the NameDeclaration object was preceeded by template, or the next token in the buffer is a < character and isTemplate returns true, then the saveId is called for the TokenBuffer and an ID_template_name is returned. The grammar only wants an ID_template_name when it is about to start a template argument list. Therefore, it must either be preceeded by template or proceeded by a < sign.
If the NameDeclaration object has class type and it is not an instance and it is not a typedef, then ID_class_name is returned.
If the NameDeclaration object has enum type and it is not an instance, then ID_enum_name is returned.
If the NameDeclaration object has namespace type and it is not an instance, then ID_original_namespace_name is returned.
If the NameDeclaration object has namespace type and it is an instance, then ID_namespace_alias is returned.
If the name declaration is a typedef:

            else if( !decl->isInstance() &&
                     (decl->getType()->getDecl() != decl ||
                     decl->getType()->checkType(Type::DT_INDIRECT) ||
                     decl->getType()->checkType(Type::DT_FUNCTION) ) ) {
      
then ID_typedef_name is returned.
If the object is a template parameter type and not an instance, then ID_class_name is returned. It used to be that ID_typedef_name was returned for this, as template parameters do not have to be classes. However, the grammar is not set up to support that, and favors ID_class_name over ID_typedef_name; therefore, it was changed.
Finally, if none of the above returned, a plain IDENTIFIER token is returned.

3.4. isTemplate This function determines if a NameDeclaration object is a template. If the object has a corresponding class scope, and the corresponding scope has template parameters, then it is a template.
If the object has a corresponding function scope, and the corresponding function scope's prototype scope has template parameters, then it is a template.
If the object has a corresponding prototype scope, and the corresponding prototype scope has template parameters, then it is a template.
If the type of the object is template template parameter, then it is a template.
If the object has been previous set as a template (through setIsTemplate), then it is a template.

4. NameDeclaration

The NameDeclaration class tries to contain as much information about the declaration of an object or type as possible. It contains a pointer to the corresponding and containing scope and to the type. It also contains whether it is an instance of the type or the type declaration. It records whether or not it is in the symbol table, is a template, or has been marked as a template. It contains a special entry to its destructor due to the fact that it should be autogenerated and can be explicitly called. It also contains a list of references (when it has been seen) that is mostly incomplete, but can be useful at times. Furthermore it contains its access (Public, Private, Protected, or Unknown), as well as, any attributes such as friend, pure, virtual, const, static, or extern (extern is not currently supported).

4.1. Constructor The constructor's main job is to change the name if it is a constructor or destructor being declared. A flag is passed in to tell if it is a destructor. The actual code is:

          if( definedIn &&
              dynamic_cast( definedIn ) &&
              getName() == definedIn->getName() ) {
          if( isDestructor )      
             setName( definedIn->getName() + "::~" + definedIn->getName() );
          else
             setName( definedIn->getName() + "::" + definedIn->getName() );
    
The name must be changed from X to X::X or X::~X to prevent name lookup from finding the constructor call when it should be finding the class type.

4.2. isaMatch This function is responsible for determining if a NameOccurrence matches a NameDeclaration in lookup. First, it checks to see if thier names match. If so, then it analyzes the OccurSpecifier of the NameOccurrence.
If the specifier is ElabEnum, then the type of the NameDeclaration must be enum, or template parameter.
If the specifier is ElabClass, then the type of the NameDeclaration must be class, template parameter, or template template parameter. If the specifier is Member, then the type can be NULL, or the type can be enum or class if it is not an instance and the type is not a namespace.
If the specifier is a namespace, then the type must be a namespace.
If the specifier is a qualifier, then the type can be a class that is not an instance, a namespace, a basic type that is not an instance (for typedef int T; T x; x::~T()), a template parameter that is not an instance, or a template template parameter that is not an instance.
If the specifier is not listed above, then a match is assumed.

4.3. getDestructor This function is responsible for finding or creating a destructor for this NameDeclaration. First, it checks to see if the type is a class or namespace type. If it is not, then a destructor called __basic_destructor is created, entered into the global scope, and returned. Otherwise, if the type is a namespace, NULL is returned. Now, it must be a class. Therefore, the corresponding class scope is looked up, the name is changed into X::~X form, and then looked up. If one is not found, a destructor is created and inserted into the class scope.

5. The Scope Heirarchy

The Scope Heirarchy consists of 7 scopes: Scope, LocalScope, NamespaceScope, ClassScope, PrototypeScope, FunctionScope, and TemplateParameterScope.

5.1. Scope Scope is an abstract base class that provides default implementations for many virtual functions.

5.1.1. insertLocal This function pushes a NameDeclaration object onto the back of a list. It also sets the NameDeclaration objects containing scope and sets the NameDeclaration inSymbolTable flag.

5.1.2. undoRecentInsertLocal This function takes a NameDeclaration and compares it to the last NameDeclaration inserted. They must be the same or an error is issued. If they are the same, the NameDeclaration last inserted is poped from the back of the list.

5.1.3. removeDeclaration This function accepts a NameDeclaration and finds it in its list of local objects. If found, it removes it.

5.1.4. findHere Given a NameOccurrence, this function looks through its list of NameDeclarations to try and find a match. It uses the isaMatch function of the NameDeclaration class to assure a match. Furthermore, if the occurrence has not been specified to be a class or an enum, it searches for non types first, then types. This allows:

        class X {};
        X X;
        X X2; // this should fail...X refers to the variable
        class X X3; // this should pass...X has been specified as a class.
      

5.2. LocalScope A LocalScope represents a basic block. In addition to the NameDeclaration list inherited from Scope, it also contains a list of used namespaces.

5.2.1. canContain This returns true if the specifier is nothing, ElabClass, or ElabEnum, since their declarations can occur within a LocalScope.

5.2.2. lookup First the name occurence is passed to canContain. If it can be contained within a local scope, the findHere method of Scope is called. If the declaration has not been found, then the used namespaces are searched. If it still has not been found, then the containing scope is looked in as well as any anonymous scopes of the containing scope.

5.2.3. searchUsedNamespaces This function merely loops through the list of namespace scopes and does a lookup in each one.

5.3. NamespaceScope The NamespaceScope class represents a named or anonymous namespace scope. It also contains a list of used namespaces.

5.3.1. canContain A NamespaceScope can contain an enum, class, member, namespace, pseudoDestructor, Destructor, Qualilfier, or nothing.

5.3.2. lookup Since namespaces may recursively include one another, they must be stored in a set as they are being searched. If the namespace is ever found in the set, then it is not searched as it has already been searched through a recursive using clause. If it is not found in the set, then it puts a pointer to itself in the set saying that it has been searched. If the canContain member function returns true, then the findHere function is called on the NameOccurrence being looked up. If a NameDeclaration is not found, then used namespaces are searched. If it has still not been found, then the containing scope is searched along with any anonymous scopes in the containing scope.

5.3.3. searchUsedNamespaces This function merely loops through the list of namespace scopes and does a lookup in each one.

5.4. ClassScope A ClassScope represents the scope of a class declaration. Along with the NameDeclaration list, a list of base classes and thier access restrictions, a list of friends, and a TemplateParameterScope, is stored.

5.4.1. canContain A class scope can contain a conversion function, a destructor, a destructor qualifier, an enum, a class, a member, a pseudo-destructor, a qualifier, or nothing.

5.4.2. lookup This function first checks the canContain function to see if it should be looked up in this scope. If so, the findHere function is called. If it is not found, but the name is the same as the class name, then the current declaration is returned (for destructors). If the declaration has still not been found, and a template parameter scope exists, then the template parameter scope is searched. Next, the base classes are searched followed by any containing scopes.

5.5. Prototype Scope A PrototypeScope represents a function's prototype. It also contains the template parameter scope, if any, for the function.

5.5.1 canContain A PrototypeScope can only contain those declarations whose specifier is nothing.

5.5.2 lookup If a template parameter scope exists, then it is looked up in first. Otherwise, the current scope is looked in. If still not found, the containing scope is examined. Finally, if the declaration is qualified, and the qualifier is of class scope, that scope is examined.

5.6. FunctionScope A function scope represents the scope of the entire function, not just a basic block. Therefore, the ISO C++ standard states that the only thing that can be in a FunctionScope is a label. It is the LocalScope inside the FunctionScope that actually contains most declarations.
A function scope contains a pointer to its prototype scope and a list of pointers to local scopes declared inside it.

5.6.1. canContain Since the FunctionScope can only contain labels, the only specifier that can occur is nothing.

5.6.2. lookup Lookup examines the current scope to try and find the declaration. If it is not found, and the Prototype scope exists, then the Prototype scope is examined. If still not found, then containing scopes and thier anonymous scopes are examined.

5.7. TemplateParameterScope A TemplateParameterScope represents any template argument which exists in a template parameter list.

5.7.1. canContain A template parameter scope can contain an occurence with ElabEnum, ElabClass, a Qualifier, or nothing.

5.7.2. lookup A template parameter scope, if canContain returns true, calls the Scope::findHere function and then checks the containing scopes and thier anonymous scopes.

6. The Type Heirarchy

The Type Heirarchy consists of 9 classes: Type, BasicType, FunctionType, IndirectType, ClassType, EnumType, NamespaceType, TemplateParameterType, and TemplateTemplateParameterType.

Each class must provide an override to the getDecl function, which returns the name declaration for the type, the checkType function, which checks the dynamic type of the object, and the clone function, which returns a copy of the object.

The checkType function uses an enum DynType which contains the following members

Each DynType enumerator maps onto a distinct Type class except for DT_FUNCTION, DT_FUNCTION_PROTOTYPE, and DT_FUNCTION_DEFINTION, which all map onto the FunctionType class.

6.1. Type This is the base class for the hierarchy and provides facilities to make the type const or volitile as well as to check to see if the type is const or volatile.

6.2. BasicType This class represents a BasicType (int, float, char, etc...).

6.2.1 checkBasicType BasicType provides an enum, BType, to represent which Basic type it is. The enum provides the following enumerators:

checkBasicType accepts one of these enumerators and returns true if the type is the same as the enumerator.

6.2.2. getDecl The getDecl function returns NULL since a basic type does not have a declaration.

6.3. FunctionType This type represents the type of a function. It contains an enum member which determines whether it is a prototype or a definition. It also contains the return type of the function.

6.3.1. getReturnType This returns the return type of the function.

6.3.2. getDecl This returns the name declaration object for the function declaration.

6.3.3. checkType This always returns true for DT_FUNCTION. For DT_FUNCTION_PROTOTYPE it returns true if the enum member is Q_PROTOTYPE. For DT_FUNCTION_DEFINITION, it returns true if the enum member is Q_FUNCTION.

6.4. IndirectType This class represents the type of a pointer, rererence, or array. It contains an enum member that determines which one it is. It also contains a pointer to the pointed-to type.

6.5. ClassType This represents the type of a class definition. It contains the NameDeclaration object for the original definition.

6.6. EnumType This represents the type of an enum definition.

6.7. NamespaceType This represents the type of a namespace definition.

6.8. TemplateParameterType This represents the type of a template parameter.

6.8. TemplateTemplateParameterType This represents the type of a template template parameter.


C. Choreographing classes


While the previous section focused on classes that acted as data structures, this section will focus on the classes that direct those data structes' actions.

1. LookupController

The LookupController is responsible for determining how a Name Occurrence or NameDeclaration should be looked up. It contains a pointer to a ContextManager object.

1.1. lookupUnqualifiedName This function first clears the set of searched namespaces, then, it gets the current scope from the ContextManager and calls the lookup method for that scope. If the NameDeclaration was not found, it asks the ContextManager for the first template scope and searches it. It has to check for the template scope seperately because the template scope may not have been attached to a class or prototype scope yet due to lookahead.

1.2. lookupQualifiedName This function first clears the set of searched namespaces. Next, it gets the qualifier from the NameOccurrence. If the qualifier has TemplateParameterType or TemplateTemplateParameterType, the function returns NULL as no lookup is available. Otherwise, the corresponding scope is retrieved and lookup is performed in that scope.

1.3. lookupPseudoDestructor This function calls the getDestructor function of the NameDeclaration passed in.

1.4. Other functions The LookupController contains other functions to lookup class scopes, function calls, binary operators, etc... However, they are not currently used and may be overhauled or removed in a future release.

2. ContextManager

The ContextManager is responsible for managing the entrance and exit of scopes. It is designed to be a singleton. It contains two data structures, a stack of scopes which represent the nested scope heirarchy, and a list of scopes representing all of the scopes seen.

2.1. getCurrentScope Returns the top of the stack of scopes.

2.2. getPreviousScope Returns the second element in the stack of scopes.

2.3. enterScope Pushes a scope on the stack

2.4. leaveCurrentScope Pops a scope off the stack

2.5. moveIntoPrototypeScope This function accepts a name declaration and attempts to move into its prototype scope. It could be that this declaration has been predeclared (e.g., int x();), if so, then it will already have a corresponding scope and that scope should be entered. This is represented by the code:

          if( decl->getType() && decl->getType()->checkType( Type::DT_FUNCTION_PROTOTYPE )
             && decl->getCorrespondingScope() != NULL ) {
              enterScope( decl->getCorrespondingScope() );
              return decl;
          }
    
If that is not the case, then a new function type and Prototype scope should be created and the scope created should be entered. First, the return type of the function will be the current type of the NameDeclaration object passed in. That type will be used to create a FunctionType object with the Q_PROTOTYPE enum member. The newly created FunctionType object will be assigned to the NameDeclaration object's type member. Next, a new prototype scope will be created with the NameDeclaration object as the declaration for the prototype and the current scope as the containing scope. Then, the ContextManager will search for a template scope and assign it to the PrototypeScope's TemplateParameterScope member variable. Finally, the newly created scope will be pushed on the list of scopes and on the stack of scopes.

2.6. moveIntoFunctionScope This function will attempt to move into a function's scope. The first thing to check for is whether or not you are in a member intializer list. If so, the function scope has already been entered and this is a duplicate call. Simply set the mem_init variable to false, and continue onward. Second, if the NameDeclaration object passedin already has FunctionType, then it is a Q_PROTOTYPE. Change it to be a Q_FUNCTION. The prototype scope will be the NameDeclaration's corresponding scope. Next, it creates a function scope and gets any previous template parameter scope for the function prototype scope's use. Finally, it pushes the function scope on the list and stack.

2.7. moveIntoClassScope This function attempts to move into class scope. First, the current template scope is retrieved. Second, an explicit specialization is looked for. If the type is a class and it has a corresponding class scope, and that scope has template parameters, then we are entering an explicit specialization. That means that the Corresponding Scope should be pushed on both the list and stack and the function should terminate. Otherwise, we insert the NameDeclaration into the current scope with a newly created ClassType object. We also create a ClassScope and set the TemplateParameterScope accordingly. Finally, we push the scope on the back of the list and stack.

2.8. moveIntoLocalScope This function creates a new LocalScope object with the current scope as the containing scope. It then loops back through the containing scopes until it finds a function. The local scope is then added to the function's list of local scopes. Finally, the newly created scope is added to the list and stack.

2.9. moveIntoTemplateParameterScope This just creates a new TemplateParameterScope and puts it on the list and stack.

2.10. moveIntoNamespaceScope This function first checks to see if the NameDeclaration passed in already has a corresponding scope. If not, it creates a new scope, inserts the declaration into the current scope, and pushes the scope on the list. Then, it takes whichever scope should be used and pushes it on the stack of scopes.

2.11. moveIntoAnonNamespace This function gets the current scope and ensures it is a namespace scope (since anon namespaces may only occur in other namespaces). Second, it moves into the namespace scope. Finally, it adds the anonymous scope to the list of uses declarations of the containing scope.

2.12. installBaseInCurrentScope This function ensures that the current scope is a class scope. It then gets the corresponding scope of the base class' NameDeclaration object (which is passed in). If the scope was found, then it is given to the class scope to add. Otherwise, if the base class was a template parameter or a template template parameter, nothing is done.

2.13. useNamespace This function merely ensures that the current scope is either a local scope or a namespace scope. Then, it adds the NameDeclaration for the namespace to the uses list of the current scope.

2.14. undoRecentInsertLocal If the NameDeclaration is qualified, it calls undoRecentInsertLocal on the qualified scope. Otherwise it calls it on the current scope.

2.15. useDeclaration This function creates a new NameDeclaration based on the one passed in and inserts it into the local scope. It then sets the newly created NameDeclaration's instance to whatever the old instance was and also sets the isTemplate member in the same way.

2.16. getCurrentClassScope This function merely loops through containing scopes until it finds one that is a class scope.

2.17. getCurrentTemplateScope This function looks at the list of scopes and finds the first scope that is a TemplateParmaeterScope. If the function's argumetn is true, it erases the scope from the list, to prevent others from finding it. Otherwise, it leaves it alone.

2.18. insertLocal This function choreographs the insertion of a name declaration in a scope. First, it checks to see if the name is qualifed. If it is, it will insert into the qualified declaration's scope. Otherwise, it inserts into the local scope. Next, it checks for duplicate class/namespace entries. If the declaration is already in the symbol table, and the containing scope is the scope previously found, and the type passed in is the same as the NameDeclaration's type and the decl is not an instance, then it is a duplicate class/namespace declaration and is not inserted again. Otherwise, if the type is a function prototype type, and the containing scope is the scope previously found, then nothing is done. If neither of the above hold, then the NameDeclaration found by lookup was incorrect and a new one must be created. To create a new one, it checks to see if it will be a destructor, if so it reformats the name into the X::~X format. Then, it creates a new NameDeclaration, sets its occur qualifier (if any), and inserts it in the scope.

3. ActionFacade

The ActionFacade is a thin wrapper around the LookupController and ContextManager. It also adds a few utility functions for the parser actions to use. The wrapper functions will not be discussed here, as they merely forward the function call to the appropriate object.

3.1. makeAnonIdent This function is called to create an anonymous identifier for things such as cast operators . The anonymous identifier created is in the form __anonX where X is a unique number.

3.2. makeAnonParmIdent This function is called to create an anonymous identifier for unnamed function parameters. The anonymous identifier created is in the form __anonX where X is the position of the parameter.

3.3 makeAnonNSIdent This function is called to create an anonymous identifier for unnamed namespaces. This funtion always returns "anon".


D. Validation Classes


There are six classes used to help validate the Scope heirarchy, Type heirarchy, and NameDeclaration class. Each one of the three mentioned has a corresponding InvariantVisitor and InvariantFacilitator. The InvariantFacilitator stores references to objects that it should validate. The InvariantVisitor loops over those stored references and makes assertions about the objects stored. Since the objects are monotonic in nature, they are only validated at the end of the program. The actual assertions used are listed in a subsequent section.

1. *InvariantFacilitator

There is a TypeInvariantFaciliator, ScopeInvariantFacilitator, and NameDeclarationInvariantFacilitator. They all follow the same pattern. Each contains a static vector of pointers to objects of that type.

1.1 CheckInvariants Checks the invariants on this object. This uses a form of the acyclic visitor. A dynamic_cast is performed for each class in the heirarchy (most derived first) until the right class is found, then it is sent to the visitor.

1.2 CheckStoredClassesInvariants Checks the invariants on all objects stored by calling CheckInvariants on each object in the array.

2. *InvariantVisitor

Type, Scope, and NameDeclaration also each have an InvariantVisitor. This Visitor uses the Visitor pattern to check the correct invariants based on the dynamic type of the object being checked.

E. Abstract syntax graph


The abstract syntax graph was designed to be similar in nature to the one described by the Bell Canada Datrix proposal. The classes from which it is composed are created using the treecc sytem. The operations that act on those classes are also specified using treecc.

1. The classes

The classes and their purposes are listed below:

cASGNds
The base class of all ASG nodes
cIdentifier
The base class of Types, Objects, and Functions. It contains a name and a visibility.
cObjectList
A list of object declarations. Used for declarations of the form int x,y,z;.
cExpression
The base class for all expressions
cCtrlStmt
The base class for all control statements
cType
The base class for all types
cScope
The base class for all scopes
cThrowSpec
Represents a throw specification on a function
cBuiltInType
Represents a built in type.
cEnumType
Represents an enum type
cArrayType
Represents an array type
cPtrType
Represents a pointer type
cRefType
Represents a reference type
cFctType
Represents the type of a function
cAggrType
Represents a class, struct, or union type
cAliasType
Represents a typedef type
cForwardType
Represents a forward type declaration
cScopeCompil
Represents the scope of a compilation unit
cNameSpace
Represents the scope of a namespace
cBlock
Represents the scope of a local block
cTryBlock
Represents the scope of a try block
cCatchBlock
Represents the scope of a catch block
cObject
Represents a variable
cTemplateParm
Base class of template parameters
cConstTemplateParm
Represents a template parameter that has a specified type (i.e., non-generic)
cGenericTemplateParm
Represents a template parameter that has a generic type
cTemplateTemplateParm
Represents a template template parameter
cIfDecl
Represents a declaration made inside an if statement
cEnumerator
Represents an enumerator
cFunction
Represents a function definition
cExtern
Represents an extern block
cUsing
Represents a using directive or declaration
cFormalCatchParam
Represents a catch block parameter
cFormalFctParam
Represents a function parameter
cLabel
Represents a label
cCaseLabel
Represents a label in a case statement
cDefaultLabel
Represents the default label in a case statement
cLiteral
Represents a literal value
cActualFctParamList
A list of actual function arguments
cEmptyExpr
A class representing nothing
cNameRef
An occurence of a name in the source code
cInitList
An object's initializer list
cCreationExpr
A new expression
cDestroyExpr
A delete expression
cJump
The base class for control transfer statements
cBreak
A break statement
cContinue
A continue statement
cGoto
A goto statement
cReturn
A return statement
cThrow
A Throw statement
cSelection
A base class for selection statements (if, switch)
cIf
An if statement
cSwitch
A switch statement
cLoop
A base class for loop statments (while,for,do)
cWhileDoLoop
A while loop
cDoWhileLoop
A do while loop
cForLoop
A for loop
cOperator
A base class for operators
cUnaryOperator
A unary operator
cBinaryOperator
A binary operator (includes function calls and casts)
cQuestionOperator
the ?: operator

2. The connections

Each class is connected to other classes by what can be thought of as edges on a graph. In many systems, these edges are represented by actual classes; in our system, the edges are simply pointers to the connected classes.

The connections are as follows:
cObjectList
This contains a list of cASGNds which were declared. For instance, int x,y,z will create an object list with three members; one for each x, y, and z.
cScope
This contains a list of cASGNds which represent the scope's contenets
cThrowSpec
This contains a list of cTypes which represent types that can be thrown by the function
cEnumType
This contains a list of enumerators which were declared inside it
cArrayType
This contains a cType which is the type of the array elements. It also contains a list of cExpression objects which are the array dimensions.
cPtrType
This contains a cType which represents the dereferenced type
cRefType
This contains a cType which represents the dereferenced type
cFctType
This contains a cType which represents the return type, a cThrowSpec which represents the throw specification, and a list of cFormalFctParam which are the parameters.
cAggrType
This contains a list of (cNameRef,Visibility) pairs which represent base classes and thier visibility. It also contains a list of cASGNds which were declared as friends and a list of cASGNds which were declared as members.
cAliasType
This contains a cType which represents the type aliased to as well as a list of cExpressions which include any additional array dimensions.
cForwardType
This contains a list of cASGNds representing the actual template parameter arguments, and a cNameRef representing a nested name specifier. It also contains a list of cTemplateParm which are formal template parameters.
cCatchBlock
This contains a cFormalCatchParam which represents the type that should be caught
cObject
This contains an array of cExpressions representing the actual array sizes, a list of cASGNds representing actual template arguments, a cExpression representing the number of bits (if it is a bitfield), a cExpression representing its initial value, and a cType representing its type.
cConstTemplateParm
This contains a cFormalFctParam representing the declared parameter
cGenericTemplateParm
This contains a cType which represents any default parameter
cTemplateTemplateParm
This contains a list of cTemplateParm which represent the template parameter list. It also contains a cNameRef which represents the default.
cIfDecl
This contains the cObject being declared
cEnumerator
This contains the cExpression to which the enumerator evaluates to
cFunction
This contains a cASGNds which represents the block of executable statements making up the function, a cBlock representing the initializer list of a constructor, a cFctType representing the type of the function, and a cASGNds list representing template arguments
cExtern
This contains a cASGNds list which represents declarations inside the extern block
cLabel
This contains a cASGNds with the code that is labeled
cCaseLabel
This contains a cExpression class which represents the value of the case
cActualFctParamList
This contains a list of cExpression objects representing the actual arguments to a function
cNameRef
This contains a cASGNds list representing the actual template arguments and a cNameRef representing a nested name specifier. It also has a cType if it is a cast operator representing the type of the operator
cInitList
This contains a cActualFctParamList describing the initializer values
cCreationExpr
This contains a cType describing the type to create, a cExpression with the initial value, and a cActualFctParamList with any placement new options
cDestroyExpr
This contains a cExpression with what to delete
cGoto
This contains a cExpression with where to go
cReturn
This contains a cExpression with the value to return
cThrow
This contains a cExpression with the value to throw
cIf
This contains a cExpression with the condition, a cASGNds which should be executed if the condition is true, and a cASGNds which should be executed if the condition is false.
cSwitch
This contains a cExpression with the condition, and a cASGNds with the cases
cWhileDoLoop
This contains a cExpression with the condition and a cASGNds with the code to execute
cDoWhileLoop
This contains a cExpression with the condition and a cASGNds with the code to execute
cForLoop
This contains a cASGNds with the initial value, a cExpression with the conditional statement, a cExpression with the increment statement, and a cASGNds with the executable code.
cUnaryOperator
This contains a cExpression containing the expression to apply the operator to
cBinaryOperator
This contains two cExpressions representing the left hand and right hand side of the expression
cQuestionOperator
This contains three cExpression objects representing the expression, true value, and false value.

3. A diagram

XXX

4. Operations

Certain operations are defined on these heirarchies. Some operations are defined here. Those that are not defined here are trivial to understand and can be found in the ASG/asgnds.tc file.

getType(cIdentifier) If the cIdentifier is derived from Type, it returns itself.
If the cIdentifeir is a cObject or cFunction, it returns the cType member.
If the cIdentifier is a cConstTemplateParameter, it returns the cType member of its cFormalFctParam member.
Otherwise, it returns NULL

setType(cIdentifier) If the cIdentifier is an cObject, it sets the cType member.
If the cIdentifier is a cConstTemplateParm, it sets the cType member of the cFormalFctParam.
If the cIdentifier is a cFunction, it sets the return type of the function.

setTemplateParms(cType) If the cType is a cFctType, cForwardType, or cAggrType, the template parameters are set to the list passed in.

5. Visitor accept

Each class also defines a virtual accept method that calls the visit method in the ASGVisitor class passed in. This is used to add algorithms to the heirarchy without having to modify the treecc file.


Implementation issues


A. Difficulties encountered


As with all software projects, keystone has had its share of difficulties. In this section, we will outline some of the major difficulties that plague the keystone source code. Much of the ugliness that is present in the source is due to these issues.

1. The fundamental problem

The fundamental problem with Token Decorated Parsing is the inability of the parser to execute certain actions without lookahead. This lookahead forces Token Decoration to be performed too early, at times when the correct scope has not been entered. The classical example of this is

      class X {
         typedef int AT;
         void f( AT arg );
      };
      void X::f( AT arg ) {}
      
This example poses serious difficulties for Token Decorated parsing. The parser usually needs to see that AT is a type name before it decides that the construct is a function definition. However, prototype scope has not yet been entered, since prototype scope is only entered once the construct has been identified. Therefore, AT is looked up in global scope and not found. Another example of the same problem is
      template class X : public T {};
      
In this case, the opening bracket ({) must be seen before class scope is entered; however, T must be looked up in the class scope's template scope. Once again, the lookahead has caused a problem with token decoration.

2. Remembering qualified context

A second problem with parsing C++ is remembering the qualified context in the face of templates. In a non templated language, remembering the qualified context is trivial. For example, A::B::C can be used to illustrate the fact that the qualified context for B is A, and the qualified context of C is the B and its qualified context. Therefore, you only need two lookbehind tokens to store the qualified context. In C++, an additional ~ may be interspersed between the :: and the identifier for destructors (e.g. X::~X) giving rise to the need for three lookbehind tokens. However, with templates, and the template syntax, an unknown number of lookbehind tokens are needed. For example,

        template class X{
            template class A {
               struct B {};
            };
        };
        X<17,char>::A::A>::B
      
From this example it can be seen that determining qualified context is not a trivial matter.

3. ID_template_name issues

The grammar provided by the ISO C++ committee mandates a number of decorated tokens such as ID_class_name and ID_typedef_name. One might thing that each template function or class should be decorated as an ID_template_name. However, the grammar only wants an ID_template_name when it is used in a template_id expression: ID_template_name LESS template_argument_list GREATER. Therefore, a lookahead must be done to see if the next token is LESS. If so, the current token may be decorated as an ID_template_name, otherwise it should receieve some other decoration.
Secondly, there may be some dispute over what a template parameter should be decorated as. The standard does not identifier a ID_template_parameter_name. It would seem logical to decorate it as an ID_typedef_name; however, this fails to resolve to a class_or_namespace_name which the template parameter might be. Therefore, ID_class_name was chosen as the decorated token for template parameters.

4. Precedence issues

There are a number of ambiguities in the C++ grammar which cannot be solved by token decoration without modification of the grammar. One in particular is the difference between a declaration and an initializer.

        X (f);
      
has two possible derivations. The first (and correct) one is a declaration of f with type int. The second one is a declaration of variable X with implicit type int and initializer f.

5. Grammar issues

One grammar issue was uncovered during the creation of keystone that required modification of the grammar. An elaborated_type_specifier and a postfix_expression both had derivations that should have allowed for template_ids but did not.


B. Solutions


This section will outline some of the solutions for the problems presented in the previous section.

1. Curbing lookahead

As we discussed in the previous section, lookahead is the bane of token decorated parsing. However, there are usually ways to find a "valid" point in the parse, stop the trial parse, and execute all of the actions up to that point thereby resetting lookahead back to zero. In BTYACC, this command is called YYVALID. A YYVALID call is placed after all left and right brackets ({}) as well as all semicolons (;). Furthermore, it is used to solve the function problem above. If the left paren has been seen, and the previous identifier was a known function, then we call YYVALID which will execute the previous actions and put us in prototype scope. Then, name lookup and token decoration can proceed unfettered.

2. A stack of NameDeclarations

To solve the problem of qualified name lookup, we use a stack of NameDeclaration objects. Everytime an ID_template_name is decorated, we save the identifier on a stack. When a template scope is exited, we pop the identifier off the stack and associate it with the GREATER token that ended the scope.


C. Validation


This section will examine the assertions and invariants that are present throughout the keystone system. Furthermore, it will look at invariants that should be true but are not for various reasons

1. Assertions Various assertions are made throughout the keystone system using the MAKE_ASSERTION macro. MAKE_ASSERTION creates an object of type Assertion which will throw itself if its condition is violated. The test program will catch these assertions and print them to the screen if one is thrown.
The Assertions
filefunctionassertion
ActionFacade.cpplookupUnqualifiedName The input NameOccurrence object cannot be NULL
ActionFacade.cpplookupQualifiedName The input NameOccurrence object cannot be NULL
ActionFacade.cpplookupPseudoDestructor The input NameDeclaration objects cannot be NULL
ActionFacade.cppuseDeclaration The input NameDeclaration object cannot be NULL
ActionFacade.cppuseDeclaration The input NameDeclaration object must be in the symbol table
ActionFacade.cppuseNamespace The input NameDeclaration object cannot be NULL
ActionFacade.cppuseNamespace The input NameDeclaration object's type must be namespace type
ActionFacade.cppinsertLocal The input NameDeclaration object cannot be NULL
ActionFacade.cppmoveIntoClassScope The input NameDeclaration object cannot be NULL
ActionFacade.cppmoveIntoNamespaceScope The input NameDeclaration object cannot be NULL
ActionFacade.cppmoveIntoAnonNamespace The input NameDeclaration object cannot be NULL
ActionFacade.cppmoveIntoPrototypeScope The input NameDeclaration object cannot be NULL
ActionFacade.cppmoveIntoFunctionScope The input NameDeclaration object cannot be NULL
ActionFacade.cppmoveIntoFunctionScope The input NameDeclaration object must be in the symbol table
ActionFacade.cppmoveIntoFunctionScope The input NameDeclaration object must have function prototype type
ActionFacade.cppinstallBaseClass The base class NameDeclaration object cannot be NULL.
ActionFacade.cppinstallBaseClass The base class must be of Class type, template parameter type, or template template parameter type.
ActionFacade.cppmakeFriend The current scope must not be NULL
ActionFacade.cppmakeFriend The input NameDeclaration object must have a type
ActionFacade.cppmakeFriend The containing scope must be a ClassScope
ActionFacade.cppmakeTypedef The input NameDeclaration object must not be NULL
ActionFacade.cppmakeInstance The input NameDeclaration object must not be NULL
ArgumentStack.cppendArgumentList The argument list cannot be empty
ArgumentStack.cppgetArgumentList The argument list cannot be empty
ContextManager.cppgetCurrentClassScope The nestedScopes stack cannot be empty
ContextManager.cppinstallBaseInCurrentScope The current scope must be a class scope
ContextManager.cppuseNamespace The input NameDeclaration object's corresponding scope must be a namespace scope
ContextManager.cppmoveIntoAnonNamespace The current scope must be namespace scope
ContextManager.cppinsertLocal The nested scopes stack cannot be empty
ContextManager.cppuseDeclaration The input NameDeclaration object's type cannot be NULL
DeclarationStack.cpptop The stack cannot be empty
DeclarationStack.cpppush The input NameDeclaration object cannot be NULL
DeclarationStack.cpppush The input Type object cannot be NULL
LookupController.cpplookupQualifiedName The input NameOccurrence object cannot have a NULL qualifier.
LookupController.cpplookupQualifiedName The input NameOccurrence object's qualifier's corresponding scope cannot be NULL.
LookupController.cpplookupQualifiedName The input NameOccurrence object's qualifier's corresponding scope cannot be NULL.
LookupController.cpplookupPseudoDestructor The input NameDeclaration objects cannot be NULL.
TokenBuffer.cppgetNameOccurrence An IDENTIFIER preceeding a COLCOL must have a valid NameDeclaration object attached to it.
TokenDecorator.cppgetCSIdent The input NameDeclaration object must not be NULL
Type.cppFunctionType::FunctionType The input NameDeclaration object must not be NULL
Type.cppFunctionType::print The NameDeclaration for the function must not be NULL
Type.cppIndirectType::print The indirection type must be IT_ARRAY, IT_POINTER, or IT_REFERENCE
actions.cppcondition The identifier passed to condition must be of type cObject
actions.cppdirect_declarator_arr The identifier passed to condition must be of type cObject
NameDeclaration.cppaddReference The TokenPosn passed in cannot be NULL
NameDeclaration.cpptakeReference The reference list size cannot be NULL
Scope.cppScope::insertLocal The input NameDeclaration object cannot be NULL
Scope.cppScope::undoRecentInsertLocal The list of local NameDeclarations cannot be empty
Scope.cppScope::undoRecentInsertLocal The list of local NameDeclarations cannot be empty
Scope.cppScope::undoRecentInsertLocal The back of the locals list must be the same as the input NameDeclaration
Scope.cppScope::lookup The input NameOccurrence object cannot be qualified
Scope.cppFunctionScope::lookup The input NameOccurrence object cannot be qualified
Scope.cppLocalScope::lookup The input NameOccurrence object cannot be qualified
Scope.cppTemplateParameterScope::lookup The input NameOccurrence object cannot be qualified

2. Invariants Invariants are executed at the end of the program and affect classes in the Scope and Type heirarchy, as well as, the NameDeclaration class.
The Invariants
ClassInvariant
ScopeThe NameDeclaration object corresponding to this scope implies that the corresponding scope for the NameDeclaration is the same as the scope under scrutiny.
ScopeEither the containing scope is not NULL, or the scope's name is "_GlobalNamespace".
ScopeAll of the local objects should have a containing scope that is equal to the scope under scrutiny.
FunctionScope The prototype scope's name should be the same as the FunctionScope under scrutiny's name.
FunctionScope The containing scope should be a class or namespace scope.
PrototypeScope None of the local declarations may be of NamespaceType.
PrototypeScope The containing scope must be a NamespaceScope, ClassScope, PrototypeScope, or LocalScope.
LocalScope The NameDeclaration object must be NULL.
LocalScope None of the local declarations may be of NamespaceType.
LocalScope The containing scope must be LocalScope or FunctionScope.
NamespaceScope The containing scope must be a NamespaceScope.
ClassScope None of the local declarations may be of NamespaceType.
ClassScope The containig scope must be a NamespaceScope, ClassScope, or LocalScope
NameDeclaration The containing scope not NULL implies that the corresponding scope is not equal to the containing scope.
NameDeclaration The corresponding scope cannot be a LocalScope.
NameDeclaration If the type is ClassType and it is an instance, then the corresponding scope should be NULL.
NameDeclaration If the type is NamespaceType, then the corresponding scope should be a NamespaceScope.
NameDeclaration If the type is FunctionType, then the corresponding scope should be FunctionScope or a PrototypeScope.
NameDeclaration If the type is DT_FUNCTION_PROTOTYPE, then the corresponding scope should be ProtypeScope or NULL
NameDeclaration If the type is DT_FUNCTION_DEFINITION, then the corresponding scope should be FunctionScope or NULL
NameDeclaration If the virtual attribute is set, the containing scope should be set and they type should be FunctionType.
NameDeclaration If the pure attribute is set, then the virtual attribute should be set.
NameDeclaration If the Friend attribute is set, then the type should be function or class type.
NameDeclaration If the Static attribute is set, then the type should not be NamespaceType.
NameDeclaration If the const attribute is set, then it must be an instance or have FunctionType.
BasicType The NameDeclaration of this type must be NULL.
FunctionType The NameDeclaration of this type must not be NULL.
FunctionType The Type of NameDeclaration of this type must not be NULL.
FunctionType The return type of this type must not be NULL.
FunctionType The corresponding NameDeclaration's type cannot be NamespaceType.
FunctionType The return type cannot have NamespaceType.
IndirectType The reference type must not be NULL.
ClassType The NameDeclaration object's type must be of ClassType.
EnumType The NameDeclaration object's type must be of EnumType.
NamespaceType The NameDeclaration object's type must be of NamespaceType.

3. Problems The current problem with the invariant system is that it cannot be extended to the traditional class invariant system because most of the invariants only become true after a certain period of time and it would be cumbersome to install a modality into them. Therefore, new ways of checking and denoting invariants are being investigated.


D. Abstract Syntax Graph

There are two major issues surrounding the ASG and Treecc. First, the current version of Treecc available for download (0.1.4) contains a bug which impedes the compilation of keystone. A patched version exists in the CVS tree. Second, creating Visitors by hand is painful and error prone. To solve this problem, we created another script, create_visitor.pl, which automates the process. The usage is: create_visitor.pl. It will prompt for a visitor name. It will then create a .cpp and .h file corresponding to the name you entered. The .h and .cpp files will contain a skeleton visitor that can be filled in by the user. The skeleton file is autogenerated from the Visitor.h file so that any modifications to the base class are automatically incorporated into the script.

Miscellaneous Notes


A. Upcoming features



Last modified: Mon Sep 23 02:09:25 EDT 2002