OpenCores
URL https://opencores.org/ocsvn/scarts/scarts/trunk

Subversion Repositories scarts

[/] [scarts/] [trunk/] [toolchain/] [scarts-binutils/] [binutils-2.19.1/] [cgen/] [doc/] [intro.texi] - Blame information for rev 6

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 6 jlechner
@c Copyright (C) 2000, 2009 Red Hat, Inc.
2
@c This file is part of the CGEN manual.
3
@c For copying conditions, see the file cgen.texi.
4
 
5
@node Introduction
6
@comment  node-name,  next,  previous,  up
7
@chapter Introduction to CGEN
8
 
9
@menu
10
* Overview::
11
* CPU description language::
12
* Opcodes support::
13
* Simulator support::
14
* Testing support::
15
@end menu
16
 
17
@node Overview
18
@section Overview
19
 
20
CGEN is a project to provide a framework and toolkit for writing cpu tools.
21
 
22
@menu
23
* Goal::                        What CGEN tries to achieve.
24
* Why do it?::
25
* Maybe it should not be done?::
26
* How ambitious is CGEN?::
27
* What is missing that should be there someday?::
28
@end menu
29
 
30
@node Goal
31
@subsection Goal
32
 
33
The goal of CGEN (pronounced @emph{seejen}, and short for
34
"Cpu tools GENerator") is to provide a uniform framework and toolkit
35
for writing programs like assemblers, disassemblers, and
36
simulators without explicitly closing any doors on future things one
37
might wish to do.  In the end, its scope is the things the software developer
38
cares about when writing software for the cpu (compilation, assembly,
39
linking, simulation, profiling, debugging, ???).
40
 
41
Achieving the goal is centered around having an application independent
42
description of a CPU (plus environment, like ABI) that applications can then
43
make use of.  In the end that's a lot to ask for from one language.  What
44
applications can or should be able to use CGEN is left to evolve over time.
45
The description language itself is thus also left to evolve over time!
46
 
47
Achieving the goal also involves having a toolkit, libcgen, that contains
48
a compiled form of the cpu description plus a suite of routines for working
49
with the data.
50
@footnote{@file{libcgen} currently doesn't exist, but that was the
51
original plan.}
52
 
53
CGEN is not a new idea.  Some GNU ports have done something like this --
54
for example, the SH port in its early days.  However, the idea never really
55
``caught on''.  CGEN was started because I think it should.
56
 
57
Since CGEN is a very ambitious project, there are currently lots of
58
things that aren't written down, let alone implemented.  It will take
59
some time to flush all the details out, but in and of itself that doesn't
60
necessarily mean they can't be flushed out, or that they haven't been
61
considered.
62
 
63
@node Why do it?
64
@subsection Why do it?
65
 
66
I think it is important that GNU assembler/disassembler/simulator ports
67
be done from a common framework.  On some level it's fun doing things
68
from scratch, which was and still is to a large extent current
69
practice, but this is not the place for that.
70
 
71
@itemize @bullet
72
@item the more ports of something one has, the more important it is that they
73
be the same.
74
 
75
@item the more complex each of them become, the more important it is
76
that they be the same.
77
 
78
@item if they all are the same, a feature added to one is added to all
79
of them--within the context of their similarity, of course.
80
 
81
@item with a common framework in place the planning of how to architect
82
a port is taken care of, the main part of what's left is simply writing
83
the CPU description.
84
 
85
@item the more applications that use the common framework, the fewer
86
places the data needs to be typed in and maintained.
87
 
88
@item new applications can take advantage of data and utilities that
89
already exist.
90
 
91
@item a common framework provides a better launching point for bigger things.
92
@end itemize
93
 
94
@node Maybe it should not be done?
95
@subsection Maybe it should not be done?
96
 
97
However, no one has yet succeeded in pushing for such an extensive common
98
framework.@footnote{I'm just trying to solicit input here.  Maybe these
99
questions will help get that input.}
100
 
101
@itemize @bullet
102
@item maybe people think it's not worth it?
103
 
104
@item maybe they just haven't had the inclination to see it through?
105
(where ``inclination'' includes everything from the time it would take
106
to the dealing with the various parties whose turf you would tread on)
107
 
108
@item maybe in the case of assemblers and simulators they're not complex
109
enough to see much benefit?
110
 
111
@item maybe the resulting tight coupling among the various applications
112
will cause problems that offset any gains?
113
 
114
@item maybe there's too much variance to try to achieve a common
115
framework, so that all attempts are doomed to become overly complex?
116
 
117
@item as a corollary of the previous item, maybe in the end trying to
118
combine ISA syntax (the assembly language), with ISA semantics (simulation),
119
with architecture implementation (performance), would become overly complex?
120
@end itemize
121
 
122
@node How ambitious is CGEN?
123
@subsection How ambitious is CGEN?
124
 
125
CGEN is a very ambitious project, as future projects can be:
126
 
127
@menu
128
* More complicated simulators::
129
* Profiling tools::
130
* Program analysis tools::
131
* ABI description::
132
* Machine generated architecture reference material::
133
* Tools like what NJMCT provides::
134
* Input to a compiler backend::
135
* Hardware/software codesign::
136
@end menu
137
 
138
@node More complicated simulators
139
@subsubsection More complicated simulators
140
 
141
Current CGEN-based simulators achieve their speed by using GCC's
142
"computed goto" facility to implement a threaded interpreter.
143
The "main loop" of the cpu engine is contained within one function
144
and the administrivia of running the program is reduced to about three
145
host instructions per target instruction (one to increment a "virtual pc",
146
one to fetch the address of code that implements that next target instruction,
147
and one to branch to it).  Target instructions can be simulated with as few as
148
seven@footnote{Actually, this can be reduced even more by creating copies of
149
an instruction specialized for all the various inputs.} instructions for an
150
"add" (load address of src1, load src1, load address of src2, load src2, add,
151
load address of result, store result).  So ignoring overhead (which
152
is minimal for frequently executed code) that's ten host instructions per
153
"typical" target instruction.  Pretty good.@footnote{The actual results
154
depend, of course, on the exact mix of target instructions in the application,
155
what instructions the host cpu has, and how efficiently the rest of the
156
simulator is (e.g. floating point and memory operations can require a hundred
157
or more host instructions).}
158
 
159
However, things can still be better.  There is still some implementation
160
related overhead that can be removed.  The two instructions to branch
161
to the next instruction would be unnecessary if instruction executors
162
were concatenated together.  The fetching and storing of target registers
163
can be reduced if target registers were kept in host registers across
164
instruction boundaries (and the longer one can keep them in host registers
165
the better).  A consequence of both of these improvements is the number
166
of memory operations is drastically reduced.  There isn't a lot of ILP
167
in the simulation of target instructions to hide memory latencies.
168
Another consequence of these improvements is the opportunity to perform
169
inter-target-instruction scheduling of the host instructions and other
170
optimizations.
171
 
172
There are two ways to achieve these improvements.  Both involve converting
173
basic blocks (or superblocks) in the target application into the host
174
instruction set and compiling that.  The first way involves doing this
175
"offline".  The target program is analyzed and each instruction is converted
176
into, for example, C code that implements the instruction.  The result is
177
compiled and then the new version of the target program is run.
178
 
179
The second way is to do the translation from target instruction set to
180
host instruction set while the target program is running.  This is often
181
referred to as JIT (Just In Time) simulation (FIXME: proper phrasing here?).
182
One way to implement this is to simulate instructions the way existing
183
CGEN simulators do, but keep track of how frequently a basic block is
184
executed.  If a block gets executed often enough, then compile a translation
185
of it to the host instruction set and switch to using that.  This avoids
186
the overhead of doing the compilation on code that is rarely executed.
187
Note that here is one place where a dual cpu system can be put to good use.
188
One cpu handles the simulation and the other handles compilation (translating
189
target instructions to host instructions).
190
CGEN can@footnote{This hasn't actually been implemented so there is
191
some hand waving here.} handle a large part of building the JIT compiler
192
because both host and target architectures are recorded in a way that is
193
amenable to program manipulation.
194
 
195
A hybrid of these two ways is to translate target basic blocks to
196
C code, compile it, and dynamically load the result into the running
197
simulation.  Problems with this are that one must invoke an external program
198
(though one could dynamically load a special form of C compiler I suppose)
199
and there's a lot of overhead parsing and optimizing the C code.  On the
200
other hand one gets to take full advantage of the compiler's optimization
201
technology.  And if the application takes a long time to simulate, the
202
extra cost may be worthwhile.  A dual cpu system is of benefit here too.
203
 
204
@node Profiling tools
205
@subsubsection Profiling tools
206
 
207
It is useful to know how well an architecture is being utilized.
208
For one, this helps build better architectures.  It also helps determine
209
how well a compilation system is using an architecture.
210
 
211
CGEN-based simulators already compute instruction frequency counts.
212
It's straightforward to add register frequency counts.
213
Monitoring other aspects of the ISA is also possible.  The description
214
file provides all the necessary data, all that's needed is to write a
215
generator for an application that then performs the desired analysis.
216
 
217
Function unit, pipeline, and other architecture implementation related items
218
requires a lot more effort but it is doable.  The guideline for this effort
219
is again coming up with an application-independent specification of these
220
things.
221
 
222
CGEN does not currently support memory or cache profiling.
223
Obviously they're important, and support may be added in the future.
224
One thing that would be straightforward to add is the building of
225
trace data for usage by cache and memory analysis tools.
226
The point though is that these tools won't benefit much from CGEN's
227
existence.
228
 
229
Another kind of profiling tool is one that takes the program to
230
be profiled as input, inserts profiling code into it, and then generates
231
a new version of the program which is then run.@footnote{Note that there
232
are other uses for such a program modification tool besides profiling.}
233
Recorded in CGEN's description files should be all the necessary ISA related
234
data to do this.  One thing that's missing is code to handle the file format
235
and relocations.@xref{ABI description}.
236
 
237
@node Program analysis tools
238
@subsubsection Program analysis tools
239
 
240
Related to profiling tools are static program analysis tools.
241
By this I mean taking machine code as input and analyzing it in some way.
242
Except for symbolic information (which could come from BFD or elsewhere),
243
CGEN provides enough information to analyze machine code, both the
244
raw instructions *and* their semantics.  Libcgen should contain
245
all the basic tools for doing this.
246
@footnote{Today this is libopcodes to some degree.}
247
 
248
@node ABI description
249
@subsubsection ABI description
250
 
251
Several tools need knowledge of not only a cpu's ISA but also of the ABI
252
in use.  I think(!) it makes sense to apply the same goals that went into
253
CGEN's architecture description language to an ABI description language:
254
specify the ABI in an application independent way and then have a basic
255
toolkit/library that provides ways of using that data.
256
It might be useful to also allow the writing of program generators
257
for applications that want more than what the toolkit/library provides.
258
Perhaps not, but the basic toolkit/library should, again I think,
259
be useful.
260
 
261
Part of what an ABI defines is the file format and relocations.
262
This is something that BFD is built for.  I think a BFD rewrite
263
should happen and should be based, at least in part, on a CGEN-style
264
ABI description.  This rewrite would be one user of the ABI description,
265
but certainly not the only user.
266
One problem with this approach is that BFD requires a lot of file format
267
specific C code.  I doubt all of this code is amenable to being described
268
in an application independent way.  Careful separation of such things
269
will be necessary.  It may even be useful to ignore old file formats
270
and limit such a BFD rewrite to ELF (not that ELF is free from such
271
warts, of course).
272
 
273
@node Machine generated architecture reference material
274
@subsubsection Machine generated architecture reference material
275
 
276
Engineers often need to refer to architecture documentation.
277
One problem is that there's often only so many hardcopy manuals
278
to go around.  Since the CPU description contains a lot of the information
279
engineers need to find it makes sense to convert that information back
280
into a readable form.  The manual can then be online available to everyone.
281
Furthermore, each architecture will be documented using the same style
282
making it easier to move from architecture to architecture.
283
 
284
@node Tools like what NJMCT provides
285
@subsubsection Tools like what NJMCT provides
286
 
287
NJMCT is the New Jersey Machine Code Toolkit.
288
It focuses exclusively on the encoding and decoding of instructions.
289
[FIXME: wip, need to say more].
290
 
291
@node Input to a compiler backend
292
@subsubsection Input to a compiler backend
293
 
294
One can define a GCC port to include these four things:
295
 
296
@itemize @bullet
297
@item cpu architecture description
298
@item cpu implementation description
299
@item ABI description
300
@item miscellaneous
301
@end itemize
302
 
303
The CGEN description provides all of the cpu architecture description
304
that the compiler needs.
305
However, the current design of the CPU description language is geared
306
towards going from machine instructions to semantic content, whereas
307
what a compiler wants is to do is go from semantic content to machine
308
instructions, so in the end this might not be a reasonable thing to
309
pursue.  On the other hand, that problem can be solved in part by
310
specifying two sets of semantics for each instruction: one for the
311
compiler side of things, and one for the simulator side of things.
312
Frequently they will be the same thing and thus need only be specified once.
313
Though specifying them twice, for the two different contexts, is reasonable
314
I think.  If the two versions of the semantics are used by multiple applications
315
this makes even more sense.
316
 
317
The planned rewrite of model support in CGEN will support whatever the
318
compiler needs for the implementation description.
319
 
320
Compilers also need to know the target's ABI, which isn't relevant for
321
an architecture description.  On the other hand, more than just the
322
compiler needs knowledge of the ABI.  Thus it makes sense to think about
323
how many tools there are that need this knowledge and whether one can
324
come up with a unifying description of the ABI.  Hence one future
325
project is to add the ABI description to CGEN.  This would encompass in
326
essence most of what is contained in the System V ABI documentation.
327
 
328
That leaves the "miscellaneous" part.  Essentially this is a catchall
329
for whatever else is needed.  This would include things like
330
include file directory locations, port-specific language features, ???.
331
There's not much need to include this info in CGEN, it's pretty
332
esoteric and generally useful to only a few applications.
333
 
334
One can even envision a day when GCC emits object files directly.
335
The instruction description contains enough information to build
336
the instructions and the ABI support would provide enough
337
information on relocations and object file formats.
338
 
339
Debugging information should be treated as an orthogonal concept.
340
At present it is outside the scope of CGEN, though clearly the same
341
reasoning behind CGEN applies to debugging support as well.
342
 
343
@node Hardware/software codesign
344
@subsubsection Hardware/software codesign
345
 
346
This section isn't very well thought out -- not much time has been put
347
into it.  The thought is that some interface with VHDL/Verilog could
348
be created that would assist hw/sw codesign.
349
 
350
Another related application is to have a feedback mechanism from the
351
compilation system that helps improve the architecture description
352
(both CGEN and HDL).
353
CGEN descriptions for experimental instructions could be added,
354
and a new set of compilation tools quickly regenerated.
355
Then experiments could be run analyzing the effectiveness of the
356
new instructions.
357
 
358
@node What is missing that should be there someday?
359
@subsection What's missing that should be there someday?
360
 
361
@itemize @bullet
362
@item Support for complex ISA's (i386, m68k).
363
 
364
Early versions had the framework of the support, but it's all bit-rotten.
365
 
366
@item ABI description
367
 
368
As discussed elsewhere, one thing that many tools need knowledge of besides
369
the ISA is the ABI.  Clearly ABI's are orthogonal to ISA's and one cpu
370
may have multiple ABI's running on it.  Thus the ABI description needs to
371
be independent of the architecture description.  It would still be useful
372
for the ABI to refer to things in the architecture description.
373
 
374
@item Model description
375
 
376
The current design is enough to get reasonable cycle counts from
377
the simulator but it doesn't take into account all the uses one would
378
want to make of this data.
379
 
380
@item File organization
381
 
382
I believe a lot of what is in libopcodes should be moved to libcgen.
383
Libcgen will contain the bulk of the cpu description in processed form.
384
It will also contain a suite of utilities for accessing the data.
385
 
386
ABI support could either live in libcgen or separately in libcgenabi.
387
libbfd would be a user of this library.
388
 
389
Instruction semantics should also be recorded in libcgen, probably
390
in bytecode form.  Operand usage tables, needed for example by the
391
m32r assembler, could be lazily computed at runtime.
392
Operand usage tables are also useful to gdb's reverse-execution support.
393
 
394
Applications can either make use of libcgen or given the application
395
independence of the description language they can write their own code
396
generators to tailor the output as needed.
397
 
398
@end itemize
399
 
400
@node CPU description language
401
@section CPU description language
402
 
403
The goal of CGEN is to provide a uniform and extensible framework for
404
doing assemblers/disassemblers and simulators, as well as allowing
405
further tools to be developed as necessary.
406
 
407
With that in mind I think the place to start is in defining a CPU
408
description language that is sufficiently powerful for all the current
409
and perceived future needs: an application independent description of
410
the CPU.  From the CPU description, tables and code can be generated
411
that an application framework can then use (e.g. opcode table for
412
assembly/disassembly, decoder/executor for simulation).
413
 
414
By "application independence" I mean the data is recorded in a way that
415
doesn't intentionally close any doors on uses of the data.  One example of
416
this is using RTL to describe instruction semantics rather than, say, C.
417
The assembler can also make use of the instruction semantics.  It doesn't
418
make use of the semantics, per se, but what it does use is the input and
419
output operand information that is machine generated from the semantics.
420
Grokking operand usage from C is possible, but harder.
421
@footnote{By this I mean analyzing the C and understanding what it's doing.}
422
So by writing the semantics in RTL multiple applications can make use of it.
423
One can also generate from the RTL code in languages other than C.
424
 
425
@menu
426
* Language requirements::
427
* Layout::
428
* Language problems::
429
@end menu
430
 
431
@node Language requirements
432
@subsection Language requirements
433
 
434
The CPU description file needs to provide at least the following:
435
 
436
@itemize @bullet
437
@item elements of the CPU's architecture (registers, etc.)
438
@item elements of a CPU's implementation (e.g. pipeline)
439
@item how the bits of an instruction word map to the instruction's semantics
440
@item semantic specification in a way that is amenable to being
441
understood and manipulated
442
@item performance measurement parameters
443
@item support for multiple architecture and implementation variants
444
@item assembler syntax of the instruction set
445
@item how that syntax maps to the bits of the instruction word, and back
446
@item support for generating test files
447
@item ???
448
@end itemize
449
 
450
In addition to this, elements of the particular ABI in use is also needed.
451
These things will obviously need to be defined separately from the cpu
452
for obvious reasons.
453
 
454
@itemize @bullet
455
@item file format
456
@item relocations
457
@item function calling conventions
458
@item structure layout
459
@item ... and all the other usual stuff
460
@end itemize
461
 
462
Some architectures require knowledge of the pipeline in order to do
463
accurate simulation (because, for example, some registers don't have
464
interlocks) so that will be required as well, as opposed to being solely
465
for performance measurement.  Pipeline knowledge is also needed in order
466
to achieve accurate profiling information.  However, I haven't spent
467
much time on this yet.  The current design/implementation is a first
468
pass in order to get something reasonable, and will be revisited
469
as necessary.
470
 
471
Support for generating test files is not complete.  Currently the GAS
472
test suite generator gets by (barely) without them.  The simulator test
473
suite generator just generates templates and leaves the programmer to
474
fill in the details.  But I think this information should be present,
475
meaning that for situations where test vectors can't be derived from the
476
existing specs, new specs should be added as part of the description
477
language.  This would make writing testcases an integral part of writing
478
the .cpu file.  Clearly there is a risk in having machine generated
479
testcases - but there are ways to eliminate or control the risk.
480
 
481
The syntax of a suitable description language needs to have these
482
properties:
483
 
484
@itemize @bullet
485
@item simple
486
@item expressive
487
@item easily parsed
488
@item easy to learn
489
@item understandable by program generators
490
@item extensible
491
@end itemize
492
 
493
It would also help to not start over completely from scratch.  GCC's RTL
494
satisfies all these goals, and is used as the basis for the description
495
language used by CGEN.
496
 
497
Extensibility is achieved by specifying everything as name/value pairs.
498
This allows new elements to be added and even CPU specific elements to
499
be added without complicating the language or requiring a new element in
500
a @code{define_insn}-like entry to be added to each existing port.
501
Macros can be used to eliminate the verbosity of repetitively specifying
502
the ``name'' part, so one can have it both ways.  Imagine GCC's
503
@file{.md} file elements specified as name/value pairs with macro's
504
called @code{define_expand}, @code{define_insn}, etc.  that handle the
505
common cases and expand the entry to the full @code{(define_full_expand
506
(name addsi3) (template ...) (condition ...) ...)}.
507
 
508
Scheme also uses @code{(foo :keyword1 value1 :keyword2 value2 ...)},
509
though that isn't implemented yet (or maybe @code{#:keyword} depending
510
upon what is enabled in Guile).
511
 
512
@node Layout
513
@subsection Layout
514
 
515
Here is a graphical layout of the hierarchy of elements of a @file{.cpu} file.
516
 
517
@example
518
                           architecture
519
                           /          \
520
                      cpu-family1   cpu-family2  ...
521
                      /         \
522
                  machine1    machine2  ...
523
                   /   \
524
              model1  model2  ...
525
@end example
526
 
527
Each of these elements is explained in more detail in @ref{RTL}.  The
528
@emph{architecture} is one of @samp{sparc}, @samp{m32r}, etc.  Within
529
the @samp{sparc} architecture, the @emph{cpu-family} might be
530
@samp{sparc32} or @samp{sparc64}.  Within the @samp{sparc32} CPU family,
531
the @emph{machine} might be @samp{sparc-v8}, @samp{sparclite}, etc.
532
Within the @samp{sparc-v8} machine classificiation, the @emph{model}
533
might be @samp{hypersparc} or @samp{supersparc}.
534
 
535
Instructions form their own hierarchy as each instruction may be supported
536
by more than one machine.  Also, some architectures can handle more than
537
one instruction set on one chip (e.g. ARM).
538
 
539
@example
540
                     isa
541
                      |
542
                  instruction
543
                    /   \
544
             operand1  operand2  ...
545
                |         |
546
         hw1+ifield1   hw2+ifield2  ...
547
@end example
548
 
549
Each of these elements is explained in more detail in @ref{RTL}.
550
 
551
@node Language problems
552
@subsection Language problems
553
 
554
There are at least two potential problem areas in the language's design.
555
 
556
The first problem is variation in assembly language syntax.  Examples of
557
this are Intel vs AT&T i386 syntax, and Motorola vs MIT m68k syntax.
558
I think there isn't a sufficient number of important cases to warrant
559
handling this efficiently.  One could either ignore the issue for
560
situations where divergence is sufficient to dissuade one from handling
561
it in the existing design, or one could provide a front end or
562
use/extend the existing macro mechanism.
563
 
564
One can certainly argue that description of assembler syntax should be
565
separated from the hardware description.  Doing so would prevent
566
complications in supporting multiple or even difficult assembler
567
syntaxes from complicating the hardware description.  On the other hand,
568
there is a lot of duplication, and in the end for the intended uses of
569
CGEN I think the benefits of combining assembler support with hardware
570
description outweigh the disadvantages.  Note that the assembler
571
portions of the description aren't used by the simulator @footnote{The
572
simulator currently uses elements of the opcode table since the opcode
573
table is a nice central repository for such things.  However, the
574
assembler/disassembler isn't part of the simulator, and the
575
portions of the opcode table can be generated and recorded elsewhere
576
should it prove reasonable to do so.  The CPU description file won't
577
change, which is the important thing.}, so if one wanted to implement
578
the disassembler/assembler via other means one can.
579
 
580
The second problem area is relocations.  Clearly part of
581
processing assembly code is dealing with the relocations involved
582
(e.g. GOT table specification).  Relocation support necessarily requires
583
BFD and GAS support, both of which need cleanup in this area.  Rewriting
584
BFD to provide a better interface so reloc handling in GAS can be
585
cleaned up is believed to be something this project can and should take
586
advantage of, and that any attempt at adding relocation support should
587
be done by first cleaning up GAS/BFD.  That can be left for another day
588
though. :-)
589
 
590
One can certainly argue trying to combine an ABI description with a
591
hardware description is problematic as there can be more than one ABI.
592
However, there often isn't and in the cases where there isn't the
593
simplified porting and maintenance is worth it, in the author's opinion.
594
Furthermore, the current language doesn't embed ABI elements
595
with hardware description elements.  Careful segregation of such things
596
might ameliorate any problems.
597
 
598
@node Opcodes support
599
@section Opcodes support
600
 
601
Opcodes support comes in the form of machine generated opcode tables as
602
well as supporting routines.
603
 
604
@node Simulator support
605
@section Simulator support
606
 
607
Simulator support comes in the form of machine generated the decoder/executer
608
as well as the structure that records CPU state information (i.e., registers).
609
 
610
@node Testing support
611
@section Testing support
612
 
613
@menu
614
* Assembler/disassembler testing::
615
* Simulator testing::
616
@end menu
617
 
618
Inherent in the design is the ability to machine generate test cases both
619
for the assembler/disassembler and for the simulator.  Furthermore, it
620
is not unreasonable to add to the description file data specifically
621
intended to assist or guide the testing process.  What kinds of
622
additions that will be needed is unknown at present.
623
 
624
@node Assembler/disassembler testing
625
@subsection Assembler/disassembler testing
626
 
627
The description of instructions and their fields contains to some extent
628
not only the syntax but the possible values for each field.  For
629
example, in the specification of an immediate field, it is known what
630
the allowable range of values is.  Thus it is possible to machine
631
generate test cases for such instructions.  Obviously one wouldn't want
632
to test for each number that a number field can contain, however one can
633
generate a representative set of any size.  Likewise with register
634
fields, mnemonic fields, etc.  A good starting point would be the edge
635
cases, the values at either end of the range of allowable values.
636
 
637
When I first raised the possibility of machine generated test cases the
638
first response I got was that this wouldn't be useful because the same
639
data was being used to generate both the program and the test cases.  An
640
error might be propagated to both and thus nullify the test.  For
641
example if an opcode field was supposed to have the value 1 and the
642
description file had the value 2, then this error wouldn't be caught.
643
However, this assumes test cases are always generated during the testing run!
644
And it ignores the profound amount of typing that is saved by machine
645
generating test cases!  (I discount the argument that this kind of
646
exhaustive testing is unnecessary).
647
 
648
One solution to the above problem is to not generate the test cases
649
during the testing run (which was implicit in the proposal, but perhaps
650
should have been explicit).  Another solution is to generate the
651
test cases during the test run but first verify them by some external
652
means before actually using them in any test.  Another solution is
653
to have some trust in the generated tests.  Yes, some bugs may be missed,
654
but given the quantity of testing that can be done, some bugs may still
655
be caught that would otherwise have been missed.  Plus it's all
656
machine-driven, minimal human interaction is required.
657
 
658
So how are machine generated test cases verified?  By machine, by hand,
659
and by time.  The test cases are checked into CVS and are not regenerated
660
without care.  Every time the test cases are regenerated, the diffs are
661
examined to ensure the bug triggering the regeneration has been fixed
662
and that no new bugs have been introduced.  In all likelihood once a
663
port is more or less done, regeneration of test cases would stop anyway,
664
and all further changes would be done manually.
665
 
666
``By machine'' means that for example in the case of ports with a native
667
assembler one can run the test case through the native assembler and use
668
that as a good first pass.
669
 
670
``By hand'' means one can go through each test case and verifying them
671
manually.  This is what is done in the case of non-machine generated
672
test cases, the only difference is the perceived difference in quantity.
673
And in the case of machine generated test cases comments can be added to
674
each test to help with the manual verification (e.g. a comment can be
675
added that splits the instruction into its fields and shows their names
676
and values).
677
 
678
``By time'' means that this process needn't be done instantaneously.
679
This is no different than the non-machine generated case again except in
680
the perceived difference in quantity of test cases.
681
 
682
Note that no claim is made that manually generated test cases aren't
683
useful or needed.  The goal here is to enhance existing forms of testing,
684
not replace them.
685
 
686
@node Simulator testing
687
@subsection Simulator testing
688
 
689
Machine generation of simulator test cases is possible because the
690
semantics of each instruction is written in a way that is understandable
691
to the generator.  At the very least, knowledge of what the instructions
692
are is present!  Obviously there will be some instructions that can't
693
be adequately expressed in RTL and are thus not amenable to having a
694
test case being machine generated.  There may even be some RTL'd
695
semantics that fall into this category.  It is believed, however, that
696
there will still be a large percentage of instructions amenable to
697
having test cases machine generated for them.  Such test cases can
698
certainly be hand generated, but it is believed that this is a large
699
amount of unnecessary typing that typically won't be done due to the
700
amount.
701
 
702
An example is the simple arithmetic instructions.  These take zero, one,
703
or more arguments and produce a result.  The description file contains
704
sufficient data to generate such an instruction, the hard part is in
705
providing the environment to set up the required inputs (e.g. loading
706
values into registers) and retrieve the output (e.g. retrieve a value
707
from a register).
708
 
709
Certainly at the very least all the administrivia for each test case can
710
be machine generated (i.e. a template file can be generated for each
711
instruction, leaving the programmer to fill in the details).
712
 
713
The strategies mentioned for assembler/disassembler machine-generated
714
test cases also apply here.

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.