OpenCores
URL https://opencores.org/ocsvn/light8080/light8080/trunk

Subversion Repositories light8080

[/] [light8080/] [trunk/] [tools/] [c80/] [C80DOS.txt] - Blame information for rev 71

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 65 motilito
 
2
 
3
         c80dos.doc
4
 
5
         >>>  Small-C Version 1-N Compiler Documentation  <<<
6
 
7
         NOTE:  C80DOS.EXE is the MSDOS compiled binary for running on
8
                a standard PC class machine which emits 8080 assembler
9
                that can then be assembled and loaded on the PC using
10
                lasm.cpm and load.cpm with the zrun.com CP/M emulator.
11
                The final output (or any of the intermediate output in
12
                8080 assembler or Intel HEX format) can then be ported
13
                to the CP/M machine by telecommunicating with a any of
14
                a myriad of programs or by writing the disk directly
15
                using something like the Uniform.exe program or its
16
                equivalent.  Hopefully, in the near future, a Z80
17
                opcodeversion of the compiler as well as PC executable
18
                versions of lasm and load will be finished. (RDK)
19
 
20
 
21
              Available in the  directory is a compiler  for  a
22
         subset of the language C.  It consists of the two files C80.C
23
         (compiler)  and  C80LIB.I80 (runtime library) It is in source
24
         form  and  is   free   to   anyone   wishing   to   use   it.
25
         Characteristics of the compiler are as follows:
26
 
27
              (1)  It  supports  a subset of the language C.  (see the
28
         book "C A  Programming  Language",  by  Brian  Kernighan  and
29
         Dennis  Ritchie.)   (2) It is written in C itself.  (3) It is
30
         syntactically identical to the C on UNIX (unlike  some  other
31
         small  C  compilers  and  interpreters).   (4) It produces as
32
         output a text file suitable for input to an  8080  assembler.
33
         (5)  It is a stand-alone single-pass compiler (which means it
34
         does its own syntax checking  and  parsing  and  produces  no
35
         intermediate  files).  (6) It can compile itself.  This means
36
         any processor supporting C can be used to develop this  small
37
         C compiler for any other processor.
38
 
39
              The intention behind the writing of this compiler was to
40
         bring  the  C  language to small computers.  It was developed
41
         primarily on a 8080 system with  40  K  bytes  and  a  single
42
         mini-floppy.   Consequently,  an  effort was made to keep the
43
         compiler small in order to fit  within  limited  memory,  and
44
         intermediate  files  were avoided in order to conserve floppy
45
         space.
46
 
47
 
48
         COMPILER SPECIFICATIONS
49
 
50
              As of this writing, the compiler supports the following:
51
 
52
         (1) Data type declarations can be:
53
 
54
              - "char" (8 bits)
55
              - "int"  (16 bits)
56
              - (by placing an "*" before the variable name, a pointer
57
                can be formed to the respective type of
58
                data element).
59
 
60
         (2) Arrays:
61
 
62
              - single dimension (vector) arrays can be
63
                of type "char" or "int".
64
 
65
         (3) Expressions:
66
 
67
              - unary operators:
68
                 "-" (minus)
69
                 "*" (indirection)
70
                 "&" (address of)
71
                 "++" (increment, either prefix or postfix)
72
                 "--" (decrement, either prefix of postfix)
73
              - binary operators:
74
                 "+" (addition)
75
                 "-" (subtraction)
76
                 "*" (multiplication)
77
                 "/" (division)
78
                 "%" (mod, i.e. remainder from division)
79
                 "|" (inclusive 'or')
80
                 "^" (exclusive 'or')
81
                 "&" (logical 'and')
82
                 "==" (test for equal)
83
                 "!=" (test for not equal)
84
                 "<"  (test for less than)
85
                 "<=" (test for less than or equal to)
86
                 ">"  (test for greater than)
87
                 ">=" (test for greater than or equal to)
88
                 "<<" (arithmetic left shift)
89
                 ">>" (arithmetic right shift)
90
              - primaries:
91
                 -array[expression]
92
                 -function(arg1, arg2,...,argn)
93
                 -constant
94
                        -decimal number
95
                        -quoted string ("sample string")
96
                        -primed string ('a' or 'Z' or 'ab')
97
                 -local variable (or pointer)
98
                 -global (static) variable (or pointer)
99
 
100
         (4) Program control:
101
 
102
                -if(expression)statement;
103
                -if(expression) statement;
104
                        else statement;
105
                -while (expression) statement;
106
                -break;
107
                -continue;
108
                -return;
109
                -return expression;
110
                -; (null statement)
111
                -{statement; statement; ... statement;}
112
                         (compound statement)
113
 
114
         (5) Pointers:
115
 
116
                -local and static pointers can contain the
117
                 address of "char" or "int" data elements.
118
 
119
         (6) Compiler commands:
120
 
121
                - #define name string (pre-processor will replace
122
                        name by string throughout text.)
123
                - #include filename (allows program to include other
124
                        files within this compilation.)
125
                - #asm (not supported by standard C)
126
                        Allows all code between "#asm" and "#endasm"
127
                        to be passed unchanged to the target
128
                        assembler.  This command is actually a statement
129
                        and may appear in the context:
130
                        "if (expression) #asm...#endasm else..."
131
 
132
         (7) Miscellaneous:
133
 
134
                -Expression evaluation maintains the same hierarchy
135
                 as standard C.
136
 
137
                -Function calls are defined as
138
                 any primary followed by an open paren, so legal forms
139
                 include:
140
 
141
                        variable();
142
                        array[expression]();
143
                        constant();
144
                        function()();
145
 
146
                -Pointer arithmetic takes into account the data
147
                 type of the destination (e.g. pointer++ will increment
148
                 by two if pointer was declared "int *pointer").
149
 
150
                -Pointer compares generated unsigned
151
                 compares (since addresses are not signed numbers).
152
 
153
                -Often used pieces of code
154
                 (i.e. storing the primary register indirect through the
155
                 top of the stack) generate calls to library routines to
156
                 shorten the amount of code generated.
157
 
158
                -Generated code is "pure" (i.e. the code may be placed
159
                 in Read Only Memory).  Code, literals, and variables
160
                 are kept in separate sections of memory.
161
 
162
                -The generated code is re-entrant.  Everytime a function
163
                 is called, its local variables refer to a new stack
164
                 frame.  By way of example, the compiler uses
165
                 recursive-descent for most of its parsing, which relies
166
                 heavily on re-entrant (recursive) functions.
167
 
168
 
169
         COMPILER RESTRICTIONS
170
 
171
              Since recent stages of compiler check-out have been done
172
         both on an 8080 system and on UNIX, language  syntax  appears
173
         to  be identical (within the given subset) between this small
174
         C compiler and the standard UNIX compiler.
175
 
176
 
177
         Not supported yet are:
178
 
179
         (1) Structures.
180
         (2) Multi-dimensional arrays.
181
         (3) Floating point, long integer, or unsigned data types.
182
         (4) Function calls returning anything but "int".
183
         (5) The unaries "!", "~", and "sizeof".
184
         (6) The control binary operators "&&", "||", and "?:".
185
         (7) The declaration specifiers "auto", "static", "extern",
186
                and "register".
187
         (8) The statements "for", "switch", "case",
188
                and "default."
189
         (9) The use of arguments within a "#define" command.
190
 
191
 
192
         Compiler restrictions include:
193
 
194
              (1) Since it is a single-pass compiler, undefined  names
195
         are not detected and are assumed to be function names not yet
196
         defined.   If  this  assumption  is  incorrect, the undefined
197
         reference will not  appear  until  the  compiled  program  is
198
         assembled.
199
 
200
              (2)  No  optimizing is done.  The code produced is sound
201
         and capable  of  re-entrancy,  but  no  attempt  is  made  to
202
         optimize  either  for  code  size or speed.  It was assumed a
203
         post-processor optimizer  would  later  be  written  for  the
204
         target machine.
205
 
206
              (3)   Since   the   target   assembler   is  of  unknown
207
         characteristics, no attempt is made to produce pseudo-ops  to
208
         declare static variables as internal or external.
209
 
210
              (4)  Constants  are not evaluated by the compiler.  That
211
         is, the line of code:
212
 
213
                        X = 1+2;
214
 
215
         would generated code to add "1"  and  "2"  at  runtime.   The
216
         results are correct, but unnecessary code is the penalty.
217
 
218
 
219
         ASSEMBLY LANGUAGE INTERFACE
220
 
221
              Interfacing   to   assembly   language   is   relatively
222
         straight-forward.  The "#asm ...  #endasm"  construct  allows
223
         the  user  to  place assembly language code directly into the
224
         control context.  Since it is considered by the  compiler  to
225
         be a single statement, it may appear in such forms as:
226
 
227
                        while(1) #asm ... #endasm
228
 
229
                        or
230
 
231
                        if (expression) #asm...#endasm else...
232
 
233
              Due  to  the  workings of the preprocessor which must be
234
         suppressed in this construct, the pseudo-op  "#asm"  must  be
235
         the  last  item  before the carriage return on the end of the
236
         line (i.e.  the text between #asm  and  the    is  thrown
237
         away),  and  the "#endasm" pseudo-op must appear on a line by
238
         itself (i.e.  everything after #endasm is also thrown  away).
239
         Since  the  parser is completely free-format outside of these
240
         execeptions, the expected format is as follows:
241
 
242
                        if (expression) #asm
243
                        ...
244
                        ...
245
                        #endasm
246
                        else statement;
247
 
248
              Note a semicolon is not required after the #endasm since
249
         the end of context is  obvious  to  the  compiler.   Assembly
250
         language  code  within  the  "#asm  ...  #endasm" context has
251
         access to all global symbols and functions by name.  It is up
252
         to the programmer  to  know  the  data  type  of  the  symbol
253
         (whether  "char"  or  "int"  implies  a byte access or a word
254
         access).  Stack locals and  arguments  may  be  retrieved  by
255
         offset   (see   STACK  FRAME).   External  assembly  language
256
         routines invoked by  function  calls  from  the  c-code  have
257
         access to all registers and do not have to restore them prior
258
         to  exit.  They may push items on the stack as well, but must
259
         pop them off before exit.  It is the  responsibility  of  the
260
         calling  program  to  remove arguments from the stack after a
261
         function call.  This must not be done by the function itself.
262
         There is no limit to the number of  bytes  the  function  may
263
         push  onto  the  stack,  providing  they are removed prior to
264
         returning.   Since  parameters  are  passed  by  value,   the
265
         paramters on the stack may be modified by the called program.
266
 
267
 
268
 
269
         STACK FRAME
270
 
271
              The stack is used extensively by the compiler.  Function
272
         arguments  are  pushed onto the stack as they are encountered
273
         between parentheses (note, this is opposite that of  standard
274
         C,  which  means routines expressly retrieving arguments from
275
         the stack rather than declaring them by  name  must  beware).
276
         By the definition of the language, parameter passing is "call
277
         by  value".  For example the following code would be produced
278
         for the C statement:
279
 
280
                function(X, Y, z());
281
 
282
                LHLD X
283
                PUSH H
284
                LHLD Y
285
                PUSH H
286
                CALL z
287
                PUSH H
288
                CALL function
289
                POP B
290
                POP B
291
                POP B
292
 
293
              Notice, the compiler cleans up the stack after the  call
294
         using a simple algorithm to use the least number of bytes.
295
 
296
              Local  variables  allocate  as  much  stack  space as is
297
         needed, and are then assigned the current value of the  stack
298
         pointer (after the allocation) as their address.
299
 
300
                int X;
301
 
302
         would produce:
303
 
304
                PUSH B
305
 
306
         which  merely  allocates  room  on the stack for 2 bytes (not
307
         initialized to any value).  References to the local  variable
308
         X  will  now  be  made  to the stack pointer + 0.  If another
309
         declaration is made:
310
 
311
                char array[3];
312
 
313
         the code would be:
314
 
315
                DCX SP
316
                PUSH B
317
 
318
         Array[0] would  be  at  SP+0,  array[1]  would  be  at  SP+1,
319
         array[2] would be at SP+2, and X would now be at SP+3.  Thus,
320
         assembly  language  code using "#asm...#endasm" cannot access
321
         local variables by name, but must know how  many  intervening
322
         bytes  have  been  allocated  between  the declaration of the
323
         variable and  its  use.   It  is  worth  pointing  out  local
324
         declarations   allocate  only  as  much  stack  space  as  is
325
         required, including an odd number of bytes, whereas  function
326
         arguments  always  consist of two bytes apiece.  In the event
327
         the argument was type "char" (8 bits), the  most  significant
328
         byte  of  the  2-byte  value is a sign-extension of the lower
329
         byte.
330
 
331
 
332
 
333
         OPERATING THE COMPILER
334
 
335
              The small C compiler begins by asking  the  user  for  a
336
         number  of options regarding the expected compilation.  Since
337
         it was easier to ask questions than to pull arguments from  a
338
         command  line  (which  is  in no way similar between the 8080
339
         developmental  system  and  UNIX),  this  was  the  preferred
340
         method.
341
 
342
         The questions asked are as follows:
343
 
344
              Do you want the c-text to appear?
345
 
346
              This gives the  user  the  option  of  interleaving  the
347
         source code into the output file.  Response is Y or N.  If Y,
348
         a  semicolon  will  be placed at the start of each input line
349
         (to force a comment to the  8080  assembler)  and  the  input
350
         lines will be printed where appropriate.  If the answer is N,
351
         only the generated 8080 code will be output.
352
 
353
              Do you wish the globals to be defined?
354
 
355
              This  question  is primarily a developmental aid between
356
         machines.  If the  answer  is  Y,  all  static  symbols  will
357
         allocate  storage  within the module being compiled.  This is
358
         the normal method.  If N, no storage will be  allocated,  but
359
         symbol  references  will  still  be  made  in the normal way.
360
         Essentially, this question allows the user to specify all  or
361
         none  of the static symbols external.  It is to be considered
362
         a temporary measure.
363
 
364
              Starting number for labels?
365
 
366
              This  lets  the  user  supply  the  first  label  number
367
         generated by the compiler for it internal labels (which  will
368
         typically  be  "ccXXXXX",  where  XXXXX  is  a decimal number
369
         increasing with each label).  This option allows  modules  to
370
         be compiled separately and later appended on the source level
371
         without generating multi-defined labels.
372
 
373
              Output filename?
374
 
375
              This question gets from the user the name of the file to
376
         be created.  A null line sends output to the user's terminal.
377
 
378
              Input filename?
379
 
380
              This  question  gets  from  the  user  the name of the C
381
         module to use as input.  The question will be  repeated  each
382
         time  a  name  is  supplied,  allowing  the user to create an
383
         output file consisting  of  many  separate  input  files  (it
384
         behaves  as  if  the  user  had  appended  them  together and
385
         submitted only the one file).  A null line response ends  the
386
         compilation process.
387
 
388
 
389
         COMPILING THE COMPILER
390
 
391
              The  power  of  the  compiler  lies  in  the fact it can
392
         compile itself.   This  allows  a  user  to  "bootstrap"  the
393
         compiler onto a new machine without excessive recoding.
394
 
395
              To compile the compiler under the UNIX operating system,
396
         the appropriate command is:
397
 
398
              % cc C80.c -lS
399
 
400
         which will invoke the UNIX C-compiler and the UNIX linker  to
401
         create  the  runnable file "a.out".  This file may be renamed
402
         as needed and used.  No other files are needed.
403
 
404
              In  order  to  create  a compiler for a new machine, the
405
         user will need to compile the compiler into the  language  of
406
         the  destination  processor.  The procedure currently used to
407
         create the compiler for my 8080 system is as follows:
408
 
409
         (1) Edit the file C80.c to modify two lines of code:
410
 
411
         -change the line of code
412
 
413
                #include 
414
                        to
415
                #define NULL 0
416
 
417
              (this is  done  since  the  "stdio.h"  I/O  header  file
418
         contains  unparsable  lines  for  the small compiler, and the
419
         line defining NULL is the only line of  "stdio.h"  needed  by
420
         the compiler).
421
 
422
         -change the line of code
423
 
424
                #define eol 10
425
                        to
426
                #define eol 13
427
 
428
              (this is done since my 8080 system uses  for the end
429
         of line character, and UNIX uses the "newline" character).
430
 
431
 
432
         (2) Invoke the compiler (by typing "a.out" or whatever other
433
             name it was given.
434
 
435
         (3) Answer the questions by the compiler to use the file
436
             C80.c as input and to produce the file C80.I80
437
             as output.
438
 
439
         (4) Append the files C80.I80 and C80LIB.I80 (the code for the
440
             compiler and the code for the runtime library,
441
             respectively).
442
 
443
         (5) Assemble the combined file using some 8080 assembler.
444
 
445
         (6) Execute the created run file.
446
 
447
              Currently, the 8080  assembler  used  must  possess  the
448
         abilities  to  handle symbol names unique to 8 characters and
449
         to recognize lower-case symbol names  as  unique  from  their
450
         upper-case  equivalent.  This is due to the fact the compiler
451
         recognizes 8-character names and passes all  static  variable
452
         and  function names intact to the assembler.  There are a few
453
         symbol names within the compiler which are not  unique  until
454
         the  7th  character and which have "upper-case twins".  These
455
         discourage the use of  the  KL-10's  MACN80  since  it  folds
456
         lower-case  to  upper case and does not recognize 8-character
457
         names.  It may be used, however, if  the  user  is  aware  of
458
         these  limitations  and  chooses  symbol  names  within these
459
         restrictions.
460
 
461
 
462
         THE FUTURE OF THE COMPILER
463
 
464
              That part of the compiler which produces  code  for  the
465
         8080  is  all  together in the final section of the compiler.
466
         Routines used by the compiler to produce code are kept  short
467
         and  are  commented.   Changing this compiler to produce code
468
         for any other machine is a matter of changing only these  few
469
         routines,  and  does  not  entail  digging around through the
470
         internals of the program.   I  would  expect  the  change  to
471
         another  machine  could be made in an afternoon providing the
472
         target machine had the following attributes:
473
 
474
              (1) A stack, preferably running backwards as  items
475
                   are pushed onto it.
476
 
477
              (2)  Two  sixteen-bit registers.  In the 8080 these
478
                   are the HL register pair (the primary register
479
                   to the compiler) and the DE register pair (the
480
                   secondary register).
481
 
482
              (3) An assembler (or cross-assembler).
483
 
484
 
485
              Since  the  compiler is just now on its feet and subject
486
         to feedback from users, it is expected many changes  will  be
487
         made  to  it.   Already planned changes (in order of expected
488
         addition) are:
489
 
490
              (1)  Constants  will  be   pre-evaluated   by   the
491
                   compiler.   Something like x=1+2*3 will become
492
                   x=7 prior to generating any code.
493
 
494
              (2) Structures will be added.  This is one  of  the
495
                   powers  of  C.   Its  omission has always been
496
                   considered temporary.
497
 
498
              (3) Assignment operators (+=, &=,  etc.)   will  be
499
                   added.
500
 
501
              (4)   Missing   unary   and  binary  operators  and
502
                   statements will be added.
503
 
504
              (5) The expression parser will create  intermediate
505
                   tree-structures  of  the  expressions and will
506
                   walk through them before generating any  code.
507
                   This  will  allow  some  optimization and will
508
                   allow the function arguments to be  passed  on
509
                   the stack in the same sequence as UNIX.
510
 
511
              (6)  A peep-hole optimizer will be added to improve
512
                   the generated code.
513
 
514
              Many  of  these things represent a wish-list.  Time will
515
         be spent only when it becomes available.  Any volunteer  help
516
         in any of these areas would be appreciated.
517
 
518
              Questions should be directed to Ron  Cain  here  at  SRI
519
         either at extension 3860 or at CAIN@SRI-KL.
520
 
521
 
522
 
523
 
524
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.