URL
https://opencores.org/ocsvn/light8080/light8080/trunk
Subversion Repositories light8080
[/] [light8080/] [trunk/] [tools/] [c80/] [C80DOS.txt] - Rev 73
Go to most recent revision | Compare with Previous | Blame | View Log
c80dos.doc>>> Small-C Version 1-N Compiler Documentation <<<NOTE: C80DOS.EXE is the MSDOS compiled binary for running ona standard PC class machine which emits 8080 assemblerthat can then be assembled and loaded on the PC usinglasm.cpm and load.cpm with the zrun.com CP/M emulator.The final output (or any of the intermediate output in8080 assembler or Intel HEX format) can then be portedto the CP/M machine by telecommunicating with a any ofa myriad of programs or by writing the disk directlyusing something like the Uniform.exe program or itsequivalent. Hopefully, in the near future, a Z80opcodeversion of the compiler as well as PC executableversions of lasm and load will be finished. (RDK)Available in the <MICRO> directory is a compiler for asubset of the language C. It consists of the two files C80.C(compiler) and C80LIB.I80 (runtime library) It is in sourceform and is free to anyone wishing to use it.Characteristics of the compiler are as follows:(1) It supports a subset of the language C. (see thebook "C A Programming Language", by Brian Kernighan andDennis Ritchie.) (2) It is written in C itself. (3) It issyntactically identical to the C on UNIX (unlike some othersmall C compilers and interpreters). (4) It produces asoutput a text file suitable for input to an 8080 assembler.(5) It is a stand-alone single-pass compiler (which means itdoes its own syntax checking and parsing and produces nointermediate files). (6) It can compile itself. This meansany processor supporting C can be used to develop this smallC compiler for any other processor.The intention behind the writing of this compiler was tobring the C language to small computers. It was developedprimarily on a 8080 system with 40 K bytes and a singlemini-floppy. Consequently, an effort was made to keep thecompiler small in order to fit within limited memory, andintermediate files were avoided in order to conserve floppyspace.COMPILER SPECIFICATIONSAs of this writing, the compiler supports the following:(1) Data type declarations can be:- "char" (8 bits)- "int" (16 bits)- (by placing an "*" before the variable name, a pointercan be formed to the respective type ofdata element).(2) Arrays:- single dimension (vector) arrays can beof type "char" or "int".(3) Expressions:- unary operators:"-" (minus)"*" (indirection)"&" (address of)"++" (increment, either prefix or postfix)"--" (decrement, either prefix of postfix)- binary operators:"+" (addition)"-" (subtraction)"*" (multiplication)"/" (division)"%" (mod, i.e. remainder from division)"|" (inclusive 'or')"^" (exclusive 'or')"&" (logical 'and')"==" (test for equal)"!=" (test for not equal)"<" (test for less than)"<=" (test for less than or equal to)">" (test for greater than)">=" (test for greater than or equal to)"<<" (arithmetic left shift)">>" (arithmetic right shift)- primaries:-array[expression]-function(arg1, arg2,...,argn)-constant-decimal number-quoted string ("sample string")-primed string ('a' or 'Z' or 'ab')-local variable (or pointer)-global (static) variable (or pointer)(4) Program control:-if(expression)statement;-if(expression) statement;else statement;-while (expression) statement;-break;-continue;-return;-return expression;-; (null statement)-{statement; statement; ... statement;}(compound statement)(5) Pointers:-local and static pointers can contain theaddress of "char" or "int" data elements.(6) Compiler commands:- #define name string (pre-processor will replacename by string throughout text.)- #include filename (allows program to include otherfiles within this compilation.)- #asm (not supported by standard C)Allows all code between "#asm" and "#endasm"to be passed unchanged to the targetassembler. This command is actually a statementand may appear in the context:"if (expression) #asm...#endasm else..."(7) Miscellaneous:-Expression evaluation maintains the same hierarchyas standard C.-Function calls are defined asany primary followed by an open paren, so legal formsinclude:variable();array[expression]();constant();function()();-Pointer arithmetic takes into account the datatype of the destination (e.g. pointer++ will incrementby two if pointer was declared "int *pointer").-Pointer compares generated unsignedcompares (since addresses are not signed numbers).-Often used pieces of code(i.e. storing the primary register indirect through thetop of the stack) generate calls to library routines toshorten the amount of code generated.-Generated code is "pure" (i.e. the code may be placedin Read Only Memory). Code, literals, and variablesare kept in separate sections of memory.-The generated code is re-entrant. Everytime a functionis called, its local variables refer to a new stackframe. By way of example, the compiler usesrecursive-descent for most of its parsing, which reliesheavily on re-entrant (recursive) functions.COMPILER RESTRICTIONSSince recent stages of compiler check-out have been doneboth on an 8080 system and on UNIX, language syntax appearsto be identical (within the given subset) between this smallC compiler and the standard UNIX compiler.Not supported yet are:(1) Structures.(2) Multi-dimensional arrays.(3) Floating point, long integer, or unsigned data types.(4) Function calls returning anything but "int".(5) The unaries "!", "~", and "sizeof".(6) The control binary operators "&&", "||", and "?:".(7) The declaration specifiers "auto", "static", "extern",and "register".(8) The statements "for", "switch", "case",and "default."(9) The use of arguments within a "#define" command.Compiler restrictions include:(1) Since it is a single-pass compiler, undefined namesare not detected and are assumed to be function names not yetdefined. If this assumption is incorrect, the undefinedreference will not appear until the compiled program isassembled.(2) No optimizing is done. The code produced is soundand capable of re-entrancy, but no attempt is made tooptimize either for code size or speed. It was assumed apost-processor optimizer would later be written for thetarget machine.(3) Since the target assembler is of unknowncharacteristics, no attempt is made to produce pseudo-ops todeclare static variables as internal or external.(4) Constants are not evaluated by the compiler. Thatis, the line of code:X = 1+2;would generated code to add "1" and "2" at runtime. Theresults are correct, but unnecessary code is the penalty.ASSEMBLY LANGUAGE INTERFACEInterfacing to assembly language is relativelystraight-forward. The "#asm ... #endasm" construct allowsthe user to place assembly language code directly into thecontrol context. Since it is considered by the compiler tobe a single statement, it may appear in such forms as:while(1) #asm ... #endasmorif (expression) #asm...#endasm else...Due to the workings of the preprocessor which must besuppressed in this construct, the pseudo-op "#asm" must bethe last item before the carriage return on the end of theline (i.e. the text between #asm and the <CR> is thrownaway), and the "#endasm" pseudo-op must appear on a line byitself (i.e. everything after #endasm is also thrown away).Since the parser is completely free-format outside of theseexeceptions, the expected format is as follows:if (expression) #asm......#endasmelse statement;Note a semicolon is not required after the #endasm sincethe end of context is obvious to the compiler. Assemblylanguage code within the "#asm ... #endasm" context hasaccess to all global symbols and functions by name. It is upto the programmer to know the data type of the symbol(whether "char" or "int" implies a byte access or a wordaccess). Stack locals and arguments may be retrieved byoffset (see STACK FRAME). External assembly languageroutines invoked by function calls from the c-code haveaccess to all registers and do not have to restore them priorto exit. They may push items on the stack as well, but mustpop them off before exit. It is the responsibility of thecalling program to remove arguments from the stack after afunction call. This must not be done by the function itself.There is no limit to the number of bytes the function maypush onto the stack, providing they are removed prior toreturning. Since parameters are passed by value, theparamters on the stack may be modified by the called program.STACK FRAMEThe stack is used extensively by the compiler. Functionarguments are pushed onto the stack as they are encounteredbetween parentheses (note, this is opposite that of standardC, which means routines expressly retrieving arguments fromthe stack rather than declaring them by name must beware).By the definition of the language, parameter passing is "callby value". For example the following code would be producedfor the C statement:function(X, Y, z());LHLD XPUSH HLHLD YPUSH HCALL zPUSH HCALL functionPOP BPOP BPOP BNotice, the compiler cleans up the stack after the callusing a simple algorithm to use the least number of bytes.Local variables allocate as much stack space as isneeded, and are then assigned the current value of the stackpointer (after the allocation) as their address.int X;would produce:PUSH Bwhich merely allocates room on the stack for 2 bytes (notinitialized to any value). References to the local variableX will now be made to the stack pointer + 0. If anotherdeclaration is made:char array[3];the code would be:DCX SPPUSH BArray[0] would be at SP+0, array[1] would be at SP+1,array[2] would be at SP+2, and X would now be at SP+3. Thus,assembly language code using "#asm...#endasm" cannot accesslocal variables by name, but must know how many interveningbytes have been allocated between the declaration of thevariable and its use. It is worth pointing out localdeclarations allocate only as much stack space as isrequired, including an odd number of bytes, whereas functionarguments always consist of two bytes apiece. In the eventthe argument was type "char" (8 bits), the most significantbyte of the 2-byte value is a sign-extension of the lowerbyte.OPERATING THE COMPILERThe small C compiler begins by asking the user for anumber of options regarding the expected compilation. Sinceit was easier to ask questions than to pull arguments from acommand line (which is in no way similar between the 8080developmental system and UNIX), this was the preferredmethod.The questions asked are as follows:Do you want the c-text to appear?This gives the user the option of interleaving thesource code into the output file. Response is Y or N. If Y,a semicolon will be placed at the start of each input line(to force a comment to the 8080 assembler) and the inputlines will be printed where appropriate. If the answer is N,only the generated 8080 code will be output.Do you wish the globals to be defined?This question is primarily a developmental aid betweenmachines. If the answer is Y, all static symbols willallocate storage within the module being compiled. This isthe normal method. If N, no storage will be allocated, butsymbol references will still be made in the normal way.Essentially, this question allows the user to specify all ornone of the static symbols external. It is to be considereda temporary measure.Starting number for labels?This lets the user supply the first label numbergenerated by the compiler for it internal labels (which willtypically be "ccXXXXX", where XXXXX is a decimal numberincreasing with each label). This option allows modules tobe compiled separately and later appended on the source levelwithout generating multi-defined labels.Output filename?This question gets from the user the name of the file tobe created. A null line sends output to the user's terminal.Input filename?This question gets from the user the name of the Cmodule to use as input. The question will be repeated eachtime a name is supplied, allowing the user to create anoutput file consisting of many separate input files (itbehaves as if the user had appended them together andsubmitted only the one file). A null line response ends thecompilation process.COMPILING THE COMPILERThe power of the compiler lies in the fact it cancompile itself. This allows a user to "bootstrap" thecompiler onto a new machine without excessive recoding.To compile the compiler under the UNIX operating system,the appropriate command is:% cc C80.c -lSwhich will invoke the UNIX C-compiler and the UNIX linker tocreate the runnable file "a.out". This file may be renamedas needed and used. No other files are needed.In order to create a compiler for a new machine, theuser will need to compile the compiler into the language ofthe destination processor. The procedure currently used tocreate the compiler for my 8080 system is as follows:(1) Edit the file C80.c to modify two lines of code:-change the line of code#include <stdio.h>to#define NULL 0(this is done since the "stdio.h" I/O header filecontains unparsable lines for the small compiler, and theline defining NULL is the only line of "stdio.h" needed bythe compiler).-change the line of code#define eol 10to#define eol 13(this is done since my 8080 system uses <CR> for the endof line character, and UNIX uses the "newline" character).(2) Invoke the compiler (by typing "a.out" or whatever othername it was given.(3) Answer the questions by the compiler to use the fileC80.c as input and to produce the file C80.I80as output.(4) Append the files C80.I80 and C80LIB.I80 (the code for thecompiler and the code for the runtime library,respectively).(5) Assemble the combined file using some 8080 assembler.(6) Execute the created run file.Currently, the 8080 assembler used must possess theabilities to handle symbol names unique to 8 characters andto recognize lower-case symbol names as unique from theirupper-case equivalent. This is due to the fact the compilerrecognizes 8-character names and passes all static variableand function names intact to the assembler. There are a fewsymbol names within the compiler which are not unique untilthe 7th character and which have "upper-case twins". Thesediscourage the use of the KL-10's MACN80 since it foldslower-case to upper case and does not recognize 8-characternames. It may be used, however, if the user is aware ofthese limitations and chooses symbol names within theserestrictions.THE FUTURE OF THE COMPILERThat part of the compiler which produces code for the8080 is all together in the final section of the compiler.Routines used by the compiler to produce code are kept shortand are commented. Changing this compiler to produce codefor any other machine is a matter of changing only these fewroutines, and does not entail digging around through theinternals of the program. I would expect the change toanother machine could be made in an afternoon providing thetarget machine had the following attributes:(1) A stack, preferably running backwards as itemsare pushed onto it.(2) Two sixteen-bit registers. In the 8080 theseare the HL register pair (the primary registerto the compiler) and the DE register pair (thesecondary register).(3) An assembler (or cross-assembler).Since the compiler is just now on its feet and subjectto feedback from users, it is expected many changes will bemade to it. Already planned changes (in order of expectedaddition) are:(1) Constants will be pre-evaluated by thecompiler. Something like x=1+2*3 will becomex=7 prior to generating any code.(2) Structures will be added. This is one of thepowers of C. Its omission has always beenconsidered temporary.(3) Assignment operators (+=, &=, etc.) will beadded.(4) Missing unary and binary operators andstatements will be added.(5) The expression parser will create intermediatetree-structures of the expressions and willwalk through them before generating any code.This will allow some optimization and willallow the function arguments to be passed onthe stack in the same sequence as UNIX.(6) A peep-hole optimizer will be added to improvethe generated code.Many of these things represent a wish-list. Time willbe spent only when it becomes available. Any volunteer helpin any of these areas would be appreciated.Questions should be directed to Ron Cain here at SRIeither at extension 3860 or at CAIN@SRI-KL.
Go to most recent revision | Compare with Previous | Blame | View Log
