OpenCores
URL https://opencores.org/ocsvn/pavr/pavr/trunk

Subversion Repositories pavr

[/] [pavr/] [trunk/] [doc/] [pipeavr.dox] - Blame information for rev 6

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 4 doru
/*!
2
\defgroup pavr_intro Introduction
3
\par Goal
4
This project implements an \b 8 \b bit \b controller that is compatible with
5
Atmel's \ref pavr_avrarch "AVR architecture", using \b VHDL (Very High speed
6
integrated circuits Hardware Definition Language). \n
7
The device built here is not a specific controller of the AVR family, but rather
8
a maximally featured AVR controller. It is configurable enough to be able to
9
simulate most AVR family controllers. \n
10
\b The \b goal is to obtain an AVR processor that is as powerful as possible (in
11
terms of MIPS), with a work budget of about 6 months*man. \n
12
\n
13
\par Approach
14
Atmel's AVR core is reasonably fast, among the other 8 bit controllers on the
15
market (year 2002). Most instructions take one clock. The instruction set is
16
(almost) RISC. In real life applications, the average clocks per instruction
17
(CPI) is typically 1.2...1.7, depending on the application. CPI=1.4 is a good
18
average. The core has a short pipeline, with 2 stages (fetch and execute). With
19
Atmel's 0.5um technology, the core runs at 10...15 MHz. \n
20
\n
21
From the start were searched ways to improve original core's performance. \n
22
As the original core already executes most instructions in one clock, two
23
ideas come quick in mind: a deeper pipeline and issuing more than one instruction
24
per clock (multi-issue). \n
25
A deeper pipeline is relatively straightforward. A clock speed increase of about
26
3...4x is expected from a 5 or 6 stages pipeline. However, the resulted average
27
CPI is expected to be slightly bigger than the original, mainly because of jumps,
28
branches, calls and returns. They require the pipeline to be flushed, at least
29
partially, thus some clocks are lost while refilling the pipeline. \n
30
The multi-issue approach was quickly rejected. The available time budget is
31
too small for implementing both a deep pipeline and multi-issuing. On the other
32
hand, multi-issue without a deeper pipeline wouldn't make much sense. \n
33
\n
34
\par Result
35
pAVR is a \b parameterizable and \b synthesizable VHDL design, AVR-compatible,
36
that has: \n
37
38
   
  • 6 pipeline stages
  • 39
       
  • 1 instruction/clock for most instructions
  • 40
       
  • estimated clock frequency: \b ~50 \b MHz & \b 0.5 \b um; assuming that
  • 41
             Atmel's core runs at 15 MHz & 0.5 um. \n
    42
             3x Atmel original core's performance.
    43
       
  • estimated MIPS at 50 MHz: \b 28 \b MIPS (typical), \b 50 \b MIPS (peak) \n
  • 44
            3x Atmel original core's performance.. At 15 MHz, Atmel's core has
    45
            10 MIPS typical, and 15 MIPS peak.
    46
       
  • CPI: 1.7 clocks/instruction (typical), 1 clock/instruction (peak) \n
  • 47
             0.75x (typical), 1.00x (peak) Atmel original core's performance.
    48
       
  • pAVR architecture is rather computational-friendly than control-friendly.
  • 49
             \ref pavr_pipeline_jumps "Jumps", \ref pavr_pipeline_branches "branches",
    50
             \ref pavr_pipeline_skips "skips", \ref pavr_pipeline_calls "calls" and
    51
             \ref pavr_pipeline_returns "returns" are relatively expansive in terms of
    52
             clocks. A branch prediction scheme and a smarter return procedure might
    53
             be considered as upgrades.
    54
    55
    \n
    56
    The \ref pavr_src "sources" structure is \b modularized. The sources are written
    57
    based on a set of common-sense \ref pavr_src_conv "conventions" (the process
    58
    splitting strategy, signals naming, etc). Thus, pAVR is quite an easily
    59
    \b maintainable design. \n
    60
    Extensive \ref pavr_test "testing" was carried out. \n
    61
    pAVR is to be synthesized and burned into a \ref pavr_fpga "FPGA". \n
    62
    \n
    63
    \par Project structure
    64
    This project is distributed in two forms: \b release and \b devel (development). \n
    65
    \n
    66
    The \b devel distribution contains
    67
    68
       
  • pAVR documentation
  • 69
       
  • VHDL sources for pAVR and associated VHDL tests
  • 70
       
  • test programs
  • 71
       
  • some utilities (preprocessor, some useful scripts)
  • 72
    73
    In a word, the devel structure contains anything that is needed for one to develop
    74
       this project further. As a side note, this project was developed under Windows
    75
       XP. Yet, all the main software tools used here have Linux counterparts (Doxygen,
    76
       VHDL simulator, C compiler, TCL interpreter, text editor). \n
    77
    The documentation is generated via Doxygen. For those who don't know how to use
    78
       this wonderful tool, please check  www.doxygen.org . \n
    79
    In the "doc" directory can be found the sources of the documentation. Also, here
    80
       are some scripts for compiling the documentation, cleaning it up, or running
    81
       (viewing) it. \n
    82
    In the "doc/html" folder is placed the compilation result (HTML). The HTML
    83
       documentation is further compiled into a .CHM (compressed HTML) file that is
    84
       placed in the "doc/chm" folder. CHM is a very convenient file format, providing
    85
       about all the features of HTML, plus that it's very small due to compression
    86
       and very handy (a single file instead of a bunch of files and folders).
    87
       However, this file format is still Windows-bound. There are neither compilers
    88
       nor viewers for Linux (but things might change soon...). \n
    89
    The "src" folder contains pAVR VHDL sources, VHDL tests and some Modelsim macro
    90
       files. \n
    91
    The "test" folder contains the test programs (ASM and ANSI C) with which pAVR was
    92
       tested. \n
    93
    The "tools" folder contains some utilities. The most important utility is a text
    94
       preprocessor. In the VHDL sources are placed XML-like tags,  inserted as
    95
       comments. The preprocessor parses these sources and interprets the XML-like
    96
       tags. For example, some tags isolate non-synthesizable code that can easily
    97
       removed when synthesizing pAVR. The preprocessor is also used to insert
    98
       a common header into all VHDL sources. \n
    99
    Also, in the "tools" folder are some scripts that build devel or release packages. \n
    100
    \n
    101
    The \b release distribution contains only the documentation. However, all the VHDL
    102
       sources are embedded into the documentation, and are thus easily accessible. \n
    103
    The release distribution comes in two flavors: HTML or CHM. My favorite is CHM,
    104
       because it's much more compact. However, for viewing the documentation under
    105
       Linux, HTML is still needed. \n
    106
    \n
    107
    Throughout this project are a few sub-projects that must be  edited/compiled/run
    108
       independently (for example, generating the documentation, or compiling test
    109
       sources). For this purpose, I use a TCL console with stdin/stdout/stderr, and
    110
       a few buttons: edit/compile/run/clean. Each button launches a script with the
    111
       same name as the button, placed in the same folder as the console script. The
    112
       stdout/stderr of the scripts are captured on the TCL console. I use this
    113
       "project manager" (the TCL console) the very same way for, let's say, compiling
    114
       a C source or generating Doxygen documentation.
    115
    \n
    116
    \n
    117
    \n
    118
    */
    119
     
    120
     
    121
     
    122
    /*!
    123
    \defgroup pavr_avrarch AVR architecture
    124
    \par AVR features
    125
    - Load-store \ref pavr_avris "RISC" machine
    126
    - Harvard architecture, with separate program/data buses
    127
    - 2 level pipeline: fetch and execute
    128
    - Most instructions execute in 1 clock.
    129
    - Variable instruction word width: 16 or 32 bits. Most instructions are 16 bits
    130
       wide
    131
    - Register File (RF) with 32 registers
    132
    - IO File (IOF) with 64 registers
    133
    - Loads and stores operate in the Unified Memory space. \n
    134
    The Unified Memory (UM) is the space formed by concatenating the RF, IOF and the
    135
    Data Memory (DM), in this order. Thus, the RF begins at address 0 in the UM, the
    136
    IOF at address 32 and the DM at address 96.
    137
    - Register File mapped pointer registers X, Y, Z, 16 bits each, for indirect
    138
    addressing the Data Memory and the Program Memory (PM). \n
    139
    Pointer registers have pre-decrement and post-increment capabilities.
    140
     
    141
    \todo
    142
    Add some AVR kernel schematics. \n
    143
    Add some AVR general considerations.
    144
     
    145
    \par Notes on AVR downsides
    146
    Among other 8 bit microcontrollers, the AVR architecture is relatively clean and
    147
       fast. Of course, it is not perfect.
    148
    In the following, I will expand on some of the drawbacks of the AVR architecture. \n
    149
    \n
    150
    Pipeline-friendliness issues:
    151
       
    152
       
  • \b The \b register \b file, \b IO \b file \b and \b data \b memory \b have
  • 153
          \b a \b unified \b addressing \b space. \n
    154
         The Register File, IO file and Data Memory are very different entities,
    155
          from the point of view of the AVR instruction set. It's an obvious
    156
          decision to physically implement them as different memory-like entities.
    157
          Pipelining such a structure is straightforward. A simple and fast
    158
          pipeline can be built naturally. Every memory-like entity can be
    159
          assigned a fixed pipe stage during which it is accessed for writing or
    160
          for reading, with no more than one such elementary operation needed
    161
          during any instruction. \n
    162
         \b However, the AVR architecture has a unified addressing space for Register
    163
          File - IO file - Data Memory. Accessing this Unified Memory space can be
    164
          done through indirect loads and stores, via dedicated pointer registers.
    165
          Depending upon the contents of a pointer register, an access to the
    166
          Register File or the IO file or the Data Memory is needed. This
    167
          completely messes up the simple pipeline structure above, because
    168
          instructions' execution is \b data \b driven. As a result, for example, the
    169
          Register File must now be accessed, let's say for reading, in more than
    170
          one pipe stage. This is most pipeline-destructive, because different
    171
          instructions will compete on the same hardware resources. \n
    172
         Arbitration/stall schemes are required. Also, new data hazards must be
    173
          dealt with. All these are pretty complex, and come with a cost, in
    174
          terms of both power consumption and speed. \n
    175
         The unified address space does bring new addressing capabilities. However,
    176
          they are unnatural and basically useless. Who will ever place the stack in
    177
          the Register File or in the IO File? That would make some sense for low-end
    178
          controllers that don't have Data Memory at all, and rely on a Register
    179
          File mapped stack. However, the price paid for that is big. \n
    180
         As a result, pAVR's loads and stores take 2 cycles. If the pointer registers
    181
          would have pointed only in the Data Memory space, loads and stores would
    182
          have naturally taken a single clock.
    183
       
  • \b The \b Register \b File/IO \b file \b operand's \b addresses \b don't
  • 184
          \b have \b fixed \b positions \b in \b the \b instruction \b opcodes. \n
    185
         That would have allowed reducing the number of pipe stages from 6 to 5.
    186
          As a result, a lower CPI would have been obtained, because of less
    187
          cycles penalty on the instructions that modify the instruction flow
    188
          (branches, jumps, calls etc). Also, that would have ment lower power
    189
          consumption because of less registers and combinational logic.
    190
       
  • \b The \b instruction \b has \b variable \b width: \b 16 \b or \b 32 \b bits. \n
  • 191
         That is not pipeline-friendly. \n
    192
         Each 32 bit instruction could have easily been replaced by two 16 bit
    193
         instructions.
    194
       
    195
    \n
    196
    Instruction set orthogonalithy issues:
    197
       - Pointer registers X, Y, Z have addressing capabilities that are different
    198
          from each other.
    199
       - Register File locations 0...15 have different addressing capabilities than
    200
          RF locations 16...31.
    201
       - IO locations 0 to 31 support more addressing modes than IO locations 32 to
    202
          63.
    203
       - There are instructions that work on 16 bit words (for example, 16 bit
    204
          register-to-register moves). \n
    205
          The existance of such instructions on a 8 bit RISC controller is questionable.
    206
          That's not because such operations are not needed, but because the raise in
    207
          complexity and irregularity is not justifiable. \n
    208
          The cost/performance balance is negative for these instructions (we're still
    209
          talking about a controller claimed to be RISC).
    210
       - opcodes 0x95C8 and 0x9004 do exactly the same thing (LPM). \n
    211
          Other such examples might exist. \n
    212
          The instruction bits could have been used more carefully.
    213
       - CLR affects flags, while SER does not, even though they seem to be
    214
          complementary intructions. \n
    215
          This might be a design flaw in the original core \b or designed on
    216
          (a hidden) purpose by whoever designed the AVR core. By the way, if I
    217
          remember well some ancient news, AVR was designed not by Atmel, but by a
    218
          Scandinavian company that was aquired later by Atmel.
    219
     
    220
    \n
    221
    \n
    222
    */
    223
     
    224
     
    225
     
    226
    /*!
    227
    \defgroup pavr_avris AVR instruction set
    228
    \htmlonly
    229
    230
    AVR instruction set
    231
    232
       
    Arithmetic
    233
       
    Bit & Others
    234
       
    Transfer
    235
       
    Jump
    236
       
    Branch
    237
       
    Call
    238
    239
       
    240
          ADD   Rd, Rr            
    241
          ADC   Rd, Rr            
    242
          ADIW  Rd+1:Rd, K6       
    243
                                  
    244
          SUB   Rd, Rr            
    245
          SUBI  Rd, K8            
    246
          SBC   Rd, Rr            
    247
          SBCI  Rd, K8            
    248
          SBIW  Rd+1:Rd, K6       
    249
                                  
    250
          INC   Rd                
    251
          DEC   Rd                
    252
                                  
    253
          AND   Rd, Rr            
    254
          ANDI  Rd, K8            
    255
          OR Rd, Rr               
    256
          ORI   Rd, K8            
    257
          EOR   Rd, Rr            
    258
                                  
    259
    266
     
    267
          COM   Rd                
    268
          NEG   Rd                
    269
          CP Rd, Rr               
    270
          CPC   Rd, Rr            
    271
          CPI   Rd, K8            
    272
          SWAP  Rd                
    273
                                  
    274
    275
          LSR   Rd                
    276
    277
          ROR   Rd                
    278
          ASR   Rd                
    279
                                  
    280
     
    281
          MUL      Rd, Rr*        
    282
          MULS     Rd, Rr         
    283
          MULSU   Rd, Rr          
    284
          FMUL      Rd, Rr        
    285
          FMULS    Rd, Rr         
    286
          FMULSU Rd, Rr           
    287
     
    288
     
    289
       
    290
          BSET  s                 
    291
          BCLR  s                 
    292
          SBI   A, b              
    293
          CBI   A, b              
    294
          BST   Rd, b             
    295
          BLD   Rd, b             
    296
                                  
    297
          NOP                     
    298
          
    299
          BREAK**                 
    300
          SLEEP                   
    301
          WDR                     
    302
          
    303
     
    304
       
    305
          MOV     Rd, Rr          
    306
          MOVW Rd+1:Rd, Rr+1:Rr   
    307
                                  
    308
          IN Rd, A                
    309
          OUT   A, Rr             
    310
                                  
    311
          PUSH  Rr                
    312
          POP   Rr                
    313
                                  
    314
          LDI   Rd, K8            
    315
          LDS   Rd, K16           
    316
                                  
    317
          LD Rd, X                
    318
          LD Rd, -X               
    319
          LD Rd, X+               
    320
                                  
    321
          LDD   Rd, Y+K6           
    322
          LD Rd, -Y               
    323
          LD Rd, Y+               
    324
                                  
    325
          LDD   Rd, Z+K6           
    326
          LD Rd, -Z               
    327
          LD Rd, Z+               
    328
                                  
    329
          STS   K16, Rr           
    330
                                  
    331
          ST X, Rr                
    332
          ST -X, Rr               
    333
          ST X+, Rr               
    334
                                  
    335
          STD   Y+K6, Rr           
    336
          ST -Y, Rr               
    337
          ST Y+, Rr               
    338
                                  
    339
          STD   Z+K6, Rr           
    340
          ST -Z, Rr               
    341
          ST Z+, Rr               
    342
                                  
    343
          LPM                     
    344
          LPM   Rd, Z             
    345
          LPM   Rd, Z+            
    346
          ELPM                    
    347
          ELPM  Rd, Z             
    348
          ELPM  Rd, Z+            
    349
                                  
    350
          
    351
          SPM                     
    352
          
    353
     
    354
       
    355
          RJMP  K12               
    356
          IJMP                    
    357
          EIJMP                   
    358
          JMP   K22               
    359
     
    360
       
    361
          CPSE  Rd, Rr            
    362
                                  
    363
          SBRC  Rr, b             
    364
          SBRS  Rr, b             
    365
                                  
    366
          SBIC  A, b              
    367
          SBIS  A, b              
    368
                                  
    369
          BRBC  s, K7             
    370
          BRBS  s, K7             
    371
     
    372
       
    373
          RCALL   K12             
    374
          ICALL                   
    375
          EICALL                  
    376
          CALL      K22           
    377
                                  
    378
          RET                     
    379
          RETI                    
    380
     
    381
    382
    \endhtmlonly
    383
    \b * Multiplications are fully supported by the pipeline (in terms of timing,
    384
    wires and registers). However, the multiplication module itself is null-defined
    385
    in the ALU, and always returns zero for now. It will be defined and plugged into
    386
    the ALU in a future version of pAVR. \n
    387
    \b ** Italicized instructions are currently not implemented in pAVR. \n
    388
     
    389
    \n
    390
    \n
    391
    */
    392
     
    393
    /*!
    394
    \defgroup pavr_implementation Implementation
    395
    */
    396
     
    397
     
    398
     
    399
    /*!
    400
    \defgroup pavr_control Pipeline structure
    401
    \ingroup pavr_implementation
    402
    \par Shift-like flow
    403
    pAVR has a pipeline with 6 stages:
    404
    405
       
  • 1. read Program Memory (PM)
  • 406
       
  • 2. strobe Program Memory output into instruction register (INSTR)
  • 407
       
  • 3. decode instruction and read Register File (RFRD)
  • 408
       
  • 4. strobe Register File output (OPS)
  • 409
       
  • 5. execution or Unified Memory access (ALU)
  • 410
       
  • 6. write Register File (RFWR)
  • 411
    412
    \n
    413
    \image html pavr_pipestruct_01.gif \n
    414
    \n
    415
    Each pipeline stage is pretty much of an independent state machine. \n
    416
    \n
    417
    Basically, each pipeline stage receives values from the previous one, in a
    418
       \b shift-like flow. Only the `terminal' registers contain data actually used,
    419
       the previous ones are used just for synchronization. \n
    420
    For example, this is how a particular hardware resource request flows through
    421
       pipeline stages s3, s4 until it is processed in s5: \n
    422
    \n
    423
    \image html pavr_pipestruct_02.gif \n
    424
    \n
    425
    \b Exceptions from this `normal' flow are the \b stall and \b flush actions, which
    426
       can basically independently stall or reset to zero (force a nop into) any stage.
    427
       Other exceptions are when several registers in such a chain are actually used,
    428
       not only the terminal one. \n
    429
    \n
    430
    Apart from the (main) pipeline stages above (stages s1-s6), there are a number
    431
       of pipeline stages only needed by a few instructions (such as 16 bit
    432
       arithmetic, some of the skips, returns): s61, s51, s52, s53 and s54. During
    433
       these pipeline stages, the main stages are stalled. \n
    434
    \n
    435
    Stages s1, s2 are common to all instructions. They bring the instruction from
    436
       Program Memory (PM) into the instruction register (instruction fetch stages). \n
    437
    During stage s3, the instruction just read from PM is decoded. That is, the
    438
       following pipeline stages (s4, s5, s6, s61, s51, s52, s53, s54) are
    439
       instructed what to do, by means of dedicated registers. \n
    440
    \n
    441
    At a given moment, a pipe stage stage can do one of the following actions:
    442
    443
       
  • execute normally \n
  • 444
            The registers in that stage are loaded with:
    445
          
    446
             
  • Values from the previous stage, if that stage is different from s1 or s2 or s3
  • 447
             
  • Some particular values if that stage is s1 or s2 (those values are set by the
  • 448
                Program Memory manager)
    449
             
  • Values from the instruction decoder, if that stage is s3
  • 450
          
    451
       
  • flush (execute nop) \n
  • 452
          All registers in that stage are reseted to zero.
    453
       
  • stall \n
  • 454
          All registers in that stage are kept unchanged.
    455
    456
    \n
    457
     
    458
    \par Hardware resource managing
    459
    Pipeline stages can request access to hardware resources. Access to hardware
    460
       resources is done via dedicated hardware resource managers (one manager
    461
       per hardware resource; one VHDL process per manager). \n
    462
    \n
    463
    Main hardware resources:
    464
       
    465
          
  • Register File (RF)
  • 466
          
  • Bypass Unit (BPU)
  • 467
          
    468
             
  • Bypass Register 0 (Bypass chain 0) (BPR0)
  • 469
             
  • Bypass Register 1 (Bypass chain 1) (BPR1)
  • 470
             
  • Bypass Register 2 (Bypass chain 2) (BPR2)
  • 471
          
    472
          
  • IO File (IOF)
  • 473
          
  • Status Register (SREG)
  • 474
          
  • Stack Pointer (SP)
  • 475
          
  • Arithmetic and Logic Unit (ALU)
  • 476
          
  • Data Access Control Unit (DACU)
  • 477
          
  • Program Memory (PM)
  • 478
          
  • Stall and Flush Unit (SFU)
  • 479
       
    480
    \n
    481
    Only one such request can be received by a given resource at a time. If
    482
       multiple accesses are requested from a resource, its access manager
    483
       will assert an error during simulation; that would indicate a design bug. \n
    484
    The pipeline is built so that each resource is normally accessed during a
    485
       fixed pipeline stage:
    486
       
    487
          
  • RF is normally read in s3 and written in s6.
  • 488
          
  • IOF is normally read/written in s5.
  • 489
          
  • DM is normally read/written in s5.
  • 490
          
  • DACU is normally read/written in s5.
  • 491
          
  • PM is normally read in s1.
  • 492
       
    493
    However, exceptions can occur. For example, LPM instructions need to read
    494
       PM in stage s5. Also, loads/stores must be able to read/write RF in stage s5. \n
    495
    Exceptions are handled at the hardware resource managers level. \n
    496
     
    497
    \par Stall and Flush Unit
    498
    Because of the exceptions above, different pipeline stages can compete for a given
    499
       hardware resource. A mechanism must be provided to handle hardware resource
    500
       conflicts. The SFU implements this function, by arbitring hardware resource
    501
       requests. The SFU stalls some instructions (some pipeline stages), while
    502
       allowing others to execute. \n
    503
    \n
    504
    Stall handling is done through two sets of signals:
    505
       
    506
          
  • SFU requests (SFU inputs)
  • 507
          
    508
             
  • stall requests
  • 509
             
  • flush requests
  • 510
             
  • branch requests
  • 511
             
  • skip requests
  • 512
             
  • nop requests
  • 513
          
    514
          
  • SFU control signals (SFU outputs)
  • 515
          
    516
             
  • stall control
  • 517
             
  • flush control
  • 518
          
    519
          There is one pair of stall-flush control signals for each of the pipeline
    520
          stages s1, s2, s3, s4, s5, s6.
    521
       
    522
    \n
    523
    \image html pavr_hwres_sfu_01.gif
    524
    \n
    525
    Each instruction has an embedded stall behavior, that is decoded by the
    526
    instruction decoder. \n
    527
    Various instructions in the pipeline, in different execution phases, access the
    528
    SFU exactly the same way they access any other hardware resources, through SFU
    529
    access requests. \n
    530
    The SFU prioritizes stall/flush/branch/skip/nop requests and postpones younger
    531
    instructions until older instructions free the hardware resources (SFU hardware
    532
    resource including). The postponing process is done through the stall-flush
    533
    controls, on a per-pipeline stage basis. \n
    534
    The `SFU rule': when a resource conflict appears, the older instruction wins. \n
    535
    \n
    536
    Some instructions need to insert a nop \b before the instruction `wave front',
    537
       for freeing hardware resources normally used by younger instructions. For
    538
       example, loads must `steal' the Register File read port 1 from younger
    539
       instructions. \n
    540
    Nops are inserted by stalling certain pipe stages and flushing other, or
    541
       possibly the same, stages. \n
    542
    Other instructions need a nop \b after the instruction wave front, for the
    543
       previous instruction to complete and free hardware resources. For example,
    544
       stores must wait a clock, until the previous instruction frees the Register
    545
       File write port. \n
    546
    The two situations differ pretty much from the point of view of the control
    547
       structure. In the second situation, the instruction is required to stall
    548
       and flush itself, which adds additional problems. These problems are solved
    549
       by introducing a dedicated noping state machine in stage s4, whose only
    550
       purpose is to introduce at most one nop \b after any instruction. On the
    551
       other hand, introducing nops \b before an instruction wave front is
    552
       straightforward, as any instruction can stall/flush younger instructions by
    553
       means of SFU requests. \n
    554
    \n
    555
    The specific SFU requests can be found \ref pavr_hwres_sfu "here".
    556
    \n
    557
     
    558
    \par Shadowing
    559
    Let's consider the following situation: a load instruction reads the Data Memory
    560
       during pipe stage s5. Suppose that next clock, an older instruction stalls s6,
    561
       during which Data Memory output was supposed to be written into the Register
    562
       File. After another clock, the stall is removed, and s6 requests to write the
    563
       Register File, but the Data Memory output has changed during the stall.
    564
       Corrupted data will be written into the Register File. With the shadow protocol,
    565
       the Data Memory output is saved during the stall. When the stall is removed, the
    566
       Register File is written with the saved data. \n
    567
    \n
    568
    \b The \b shadow \b protocol \n
    569
    If a pipe stage is not permitted to place hardware resource requests, then mark
    570
       every memory-like entity in that stage as having its output `shadowed', and
    571
       write its associated shadow register with the corresponding data output.
    572
       Else, mark it as `unshadowed'. \n
    573
       As long as the memory-like enity is marked `shadowed', it will be read (by
    574
       whatever entity needs that) from its associated shadow register, rather than
    575
       directly from its data output. \n
    576
    In order to enable shadowing during multiple, successive stalls, shadow
    577
       memory-like entities only if they aren't already shadowed. \n
    578
    \n
    579
    Basically, the condition that shadows a memory-like entity's output is `hardware
    580
       resources are disabled during that stage'. However, there are exceptions. For
    581
       example, LPM family instructions steal Program Memory access by stalling the
    582
       instruction that would normally be fetched that time. By stalling, hardware
    583
       resource requests become disabled in that pipe stage. Still, LPM family
    584
       instructions must be able to access directly Program Memory output. Here, the
    585
       PM must not be shadowed even though during its pipe stage s2 (during which PM
    586
       is normally accessed) all hardware requests are disabled by default. \n
    587
    Fortunately, there are only a few such exceptions (holes through the shadow
    588
    protocol). Overall, the shadow protocol is still a good idea, as it permits natural
    589
    & automatic handling of a bunch of registers placed in delicate areas. \n
    590
    \n
    591
     
    592
    \todo
    593
    594
       
  • Branch prediction with hashed branch prediction table and 2 bit predictor.
  • 595
       
  • Super-RAM interfacing to Program Memory. \n
  • 596
          A super-RAM is a classic RAM with two supplemental lines: a mem_rq input
    597
          and a mem_ack output. The device that writes/reads the super-RAM knows that
    598
          it can place an access request when the memory signalizes it's ready via
    599
          mem_ack. Only then, it can place an access request via mem_rq. \n
    600
          A super-RAM is a super-class for classic RAM. That is, a super-RAM becomes
    601
          classic RAM if the RAM ignores mem_rq and keeps continousely mem_ack to 1. \n
    602
          The super-RAM protocol is so flexible that, as an extreme example, it can
    603
          serially (!) interface the Program Memory to the controller. That is, about
    604
          2-3 wires instead of 38 wires, without needing to modify anything in the
    605
          controller. Of course, that would come with a very large speed penalty, but
    606
          it allows choosing the most advantageous compromise between the number of
    607
          wires and speed. The only thing to be done is to add a serial to parallel
    608
          converter, that complies to the super-RAM protocol. \n
    609
          After pAVR is made super-RAM compatible, it can run anyway from a regular
    610
          RAM, as it runs now, by ignoring the two extra lines. Thus, nothing is
    611
          removed, it's only added. No speed penalty should be payed. \n
    612
          A simple way to add the super-RAM interface is to force nops into the
    613
          pipeline as long as the serial-to-parallel converter works on an instruction
    614
          word. \n
    615
       
  • Modify stall handling so that no nops are required \b after the instruction
  • 616
          wavefront. The instructions could take care of themselves. The idea is that
    617
          a request to a hardware resource that is already in use by an older instruction,
    618
          could \b automatically generate a stall. \n
    619
          That would:
    620
          
    621
             
  • generally simplify instruction handling
  • 622
             
  • make average instruction execution slightly faster.
  • 623
          
    624
    625
     
    626
    \n
    627
    \n
    628
    */
    629
     
    630
     
    631
     
    632
    /*!
    633
    \defgroup pavr_hwres Hardware resources
    634
    \ingroup pavr_implementation
    635
    */
    636
     
    637
     
    638
     
    639
    /*!
    640
    \defgroup pavr_hwres_rf Register File
    641
    \ingroup pavr_hwres
    642
    The Register File is a 3 port memory, with 2 read ports and 1 write port. \n
    643
    It has 32 locations, 8 bits each. \n
    644
    Separate read and write ports for the upper three 16 bit words are provided.
    645
    The upper three 16 bit words are the pointer registers X (at byte address
    646
    27:26), Y (29:28) and Z (31:30). \n
    647
    The RF is placed at the beginning of the Unified Memory space. \n
    648
    \n
    649
    \image html pavr_hwres_rf_01.gif
    650
    \n
    651
    */
    652
     
    653
     
    654
     
    655
    /*!
    656
    \defgroup pavr_hwres_rf_rd1 Read port 1
    657
    \ingroup pavr_hwres_rf
    658
    \par Register File read port 1 connectivity
    659
    \n
    660
    \image html pavr_hwres_rf_rd1_01.gif
    661
    \n
    662
    \par Requests to RF read port 1
    663
       - pavr_s3_rfrd1_rq \n
    664
          Most ALU-requiring instructions need to read an operand from RF read port 1
    665
          in the same clock as the instruction is decoded (here, "to read" = "to
    666
          strobe the read input"). Activate the read signal if necessary, via RF read
    667
          port 1 manager. \n
    668
       - pavr_s5_dacu_rfrd1_rq \n
    669
          \anchor dacu_rq
    670
          \b Note \b 1: This a somehow `missplaced' RF read port 1 request. To keep
    671
          the controller compatible with the AVR architecture, loads and stores must
    672
          operate in the Unified Memory space, that includes Register File, IO File
    673
          and Data Memory. Thus, it is possible for a LOAD to actually transfer, for
    674
          example, data from RF to RF, rather than from DM to RF (depending on the
    675
          addresses involved). \n
    676
          The DACU manager takes the decision which physical device has to be used,
    677
          and places consequent calls to the appropriate hardware resource manager.
    678
          This request is such a call. \n
    679
          \b Note \b 2: The same situation happens with `misplaced' RF writes in
    680
          stores. The stores read from RF and can actually write any of RF, IOF or DM.
    681
    \n
    682
    */
    683
     
    684
     
    685
     
    686
    /*!
    687
    \defgroup pavr_hwres_rf_rd2 Read port 2
    688
    \ingroup pavr_hwres_rf
    689
    \par Register File read port 2 connectivity
    690
    \n
    691
    \image html pavr_hwres_rf_rd2_01.gif
    692
    \n
    693
    \par Requests to RF read port 2
    694
       - pavr_s3_rfrd2_rq \n
    695
       Needed by 2 operands instructions (most ALU instructions, moves).
    696
    \n
    697
    */
    698
     
    699
     
    700
     
    701
    /*!
    702
    \defgroup pavr_hwres_rf_wr Write port
    703
    \ingroup pavr_hwres_rf
    704
    \par Register File write port connectivity
    705
    \n
    706
    \image html pavr_hwres_rf_wr_01.gif
    707
    \n
    708
    \par Requests to RF write port
    709
       - pavr_s6_aluoutlo8_rfwr_rq \n
    710
          Request to write the lower 8 bits of the ALU result into the Register File. \n
    711
       - pavr_s61_aluouthi8_rfwr_rq \n
    712
          Request to write the higher 8 bits of the ALU result into RF. \n
    713
       - pavr_s6_iof_rfwr_rq \n
    714
          Request to write IOF data out into RF. \n
    715
          Needed by IN, BLD.
    716
       - pavr_s6_dacu_rfwr_rq \n
    717
          Request to write Unified Memory data out (DACU data out) into RF. \n
    718
          Needed by loads and POP.
    719
       - pavr_s6_pm_rfwr_rq \n
    720
          Request to write Program Memory data out into RF. \n
    721
          Needed by LPM, ELPM.
    722
       - pavr_s5_dacu_rfwr_rq \n
    723
          Request to write RF out into RF. \n
    724
          Needed by stores and PUSH. \n
    725
          See \ref dacu_rq "Note 2".
    726
    \n
    727
    */
    728
     
    729
     
    730
     
    731
    /*!
    732
    \defgroup pavr_hwres_rf_xwr X port
    733
    \ingroup pavr_hwres_rf
    734
    \par X port connectivity
    735
    \n
    736
    \image html pavr_hwres_rf_xwr_01.gif
    737
    \n
    738
    This is a read and write port. \n
    739
    The contents of the X register is permanently available for reading, under the
    740
    name `pavr_rf_x'. \n
    741
    The X write port consists of a data in (pavr_rf_x_di) and a write strobe
    742
    (pavr_rf_x_wr).
    743
    \par Requests to X write port
    744
       - pavr_s5_ldstincrampx_xwr_rq \n
    745
          Increment X. \n
    746
          If the controller has more than 64KB memory, than increment RAMPX:X (24
    747
          bits) rather than X (16 bits). \n
    748
          Needed by loads and stores with postincrement. \n
    749
       - pavr_s5_ldstdecrampx_xwr_rq \n
    750
          Decrement X. \n
    751
          If more than 64KB memory, than decrement RAMPX:X rather than X. \n
    752
          Needed by loads and stores with predecrement. \n
    753
    \n
    754
    */
    755
     
    756
     
    757
     
    758
    /*!
    759
    \defgroup pavr_hwres_rf_ywr Y port
    760
    \ingroup pavr_hwres_rf
    761
    \par Y port connectivity
    762
    \n
    763
    \image html pavr_hwres_rf_ywr_01.gif
    764
    \n
    765
    This is a read and write port. \n
    766
    \par Requests to Y write port
    767
       - pavr_s5_ldstincrampy_ywr_rq \n
    768
          Increment Y or RAMPY:Y. \n
    769
          Needed by loads and stores with postincrement. \n
    770
       - pavr_s5_ldstdecrampy_ywr_rq \n
    771
          Decrement Y or RAMPY:Y. \n
    772
          Needed by loads and stores with predecrement. \n
    773
    \n
    774
    */
    775
     
    776
     
    777
     
    778
    /*!
    779
    \defgroup pavr_hwres_rf_zwr Z port
    780
    \ingroup pavr_hwres_rf
    781
    \par Z port connectivity
    782
    \n
    783
    \image html pavr_hwres_rf_zwr_01.gif
    784
    \n
    785
    This is a read and write port. \n
    786
    \par Requests to Z write port
    787
       - pavr_s5_ldstincrampz_zwr_rq \n
    788
          Increment Z or RAMPZ:Z. \n
    789
          Needed by loads and stores with postincrement. \n
    790
       - pavr_s5_ldstdecrampz_zwr_rq \n
    791
          Decrement Z or RAMPZ:Z. \n
    792
          Needed by loads and stores with predecrement. \n
    793
       - pavr_s5_lpminc_zwr_rq \n
    794
          Increment Z. \n
    795
          Needed by LPM with postincrement. \n
    796
       - pavr_s5_elpmincrampz_zwr_rq \n
    797
          Increment RAMPZ:Z. \n
    798
          Needed by ELPM with postincrement. \n
    799
    \n
    800
    */
    801
     
    802
     
    803
     
    804
    /*!
    805
    \defgroup pavr_hwres_bpu Bypass Unit
    806
    \ingroup pavr_hwres
    807
    \par General considerations
    808
    The Bypass Unit (BPU) is a FIFO-like temporary storage area, that keeps data to
    809
    be written into the Register File. \n
    810
    If an instruction computes a value that must be written into the Register File
    811
    (RF) (an ALU instruction, for example) it first writes the BPU, and then (or at
    812
    the same time) actually writes the RF. \n
    813
    If the following instructions need an operand from the RF, at the same
    814
    address where the previous result should have been written into the RF, they will
    815
    actually read that operand from the BPU rather than from RF. \n
    816
    This way, `read before write' pipeline hazards are avoided. \n
    817
    \n
    818
    The specific situations where BPU is needed are:
    819
       - when reading Register File operand(s). \n
    820
          Reading Register File operands is done through the BPU.
    821
       - when reading pointer registers. \n
    822
          Reading pointer registers is done through the BPU.
    823
     
    824
    \par Details
    825
    The algorithm of using BPU:
    826
    827
       
  • the instruction that wants to write a result into the RF, writes first the
  • 828
          BPU with 3 data fields:
    829
       
    830
          
  • the result itself
  • 831
          
  • result's address into RF
  • 832
          
  • a flag that marks this BPU entry as having valid data (a so-called
  • 833
             `active' flag)
    834
       
    835
       
  • next instruction(s) that need an operand from RF, read it through
  • 836
          a dedicated function (combinational logic), that does the following:
    837
       
    838
          
  • checks all BPU entries and see which ones are active (hold meaningful
  • 839
             data).
    840
          
  • compares operand's address against the addresses in all active BPU
  • 841
             entries.
    842
          
  • if a single address matches, gets the data in that BPU entry rather than
  • 843
             data from the RF.
    844
          
  • if multiple addresses match, gets the data in the most recent BPU entry.
  • 845
             Even though it's possible that 2 matches happen at simultaneous BPU
    846
             entries, this situation should never occur; it would indicate a design
    847
             bug. This illegal situation would assert an error during simulation.
    848
          
  • if no address matches, gets data from the RF (as if BPU were not
  • 849
             existing).
    850
       
    851
    852
    \n
    853
    The maximum delay between a write and a read from the RF is 4 clocks. Thus, the
    854
    BPU FIFO-like structure has a depth of 4. \n
    855
    On the other hand, the BPU must be able to be written 3  one byte operands, at a
    856
    time (must have 3 write ports). The most BPU demanding instructions are stores with
    857
    pre(post) decrement(increment). Both the one byte data and a 2 byte pointer register
    858
    must be written into the BPU, as well as into the RF. The 3 bytes are
    859
    simultaneousely written into so-called `BPU chains' or `BPU registers' (BPU
    860
    chains 0, 1, 2; or BPU registers 0, 1, 2; or BPR0, BPR1, BPR2). \n
    861
    \n
    862
    The BPU has 3x4 entries, each consisting of:
    863
       - an 8 bit data field
    864
       - a 5 bit address field
    865
       - a flag that marks the entry as active or inactive
    866
     
    867
    \par Accessing BPU:
    868
    \n
    869
    \image html pavr_hwres_bpu_01.gif
    870
    \n
    871
    \n
    872
    */
    873
     
    874
     
    875
     
    876
    /*!
    877
    \defgroup pavr_hwres_bpr0 Bypass chain 0
    878
    \ingroup pavr_hwres_bpu
    879
    \par Bypass chain 0 (BPR0) write port connectivity
    880
    \n
    881
    \image html pavr_hwres_bpr0_01.gif
    882
    \n
    883
    \par Requests to BPR0 write port
    884
       - pavr_s5_alu_bpr0wr_rq \n
    885
          Need by regular ALU instructions. \n
    886
       - pavr_s6_iof_bpr0wr_rq \n
    887
          Needed by instructions that read the IO File (IN, BLD).
    888
       - pavr_s6_daculd_bpr0wr_rq \n
    889
          Needed by loads.
    890
       - pavr_s5_dacust_bpr0wr_rq \n
    891
          Needed by stores.
    892
       - pavr_s6_pmdo_bpr0wr_rq \n
    893
          Needed by LPM family instructions.
    894
    \n
    895
    */
    896
     
    897
     
    898
     
    899
    /*!
    900
    \defgroup pavr_hwres_bpr1 Bypass chain 1
    901
    \ingroup pavr_hwres_bpu
    902
    \par Bypass chain 1 (BPR1) write port connectivity
    903
    \n
    904
    \image html pavr_hwres_bpr1_01.gif
    905
    \n
    906
    \par Requests to BPR1 write port
    907
       - pavr_s5_alu_bpr1wr_rq \n
    908
          Need by regular ALU instructions that have a 16 bit result (ADIW, SBIW,
    909
          MUL, MULS, MULSU, FMUL, FMULS, FMULSU, MOVW).
    910
       - pavr_s5_dacux_bpr12wr_rq \n
    911
          Needed by loads and stores with pre(post) decrement(increment). \n
    912
          Lower byte of X pointer will be written into BPR1.
    913
       - pavr_s5_dacuy_bpr12wr_rq \n
    914
          Needed by loads and stores with pre(post) decrement(increment). \n
    915
          Lower byte of Y pointer will be written into BPR1.
    916
       - pavr_s5_dacuz_bpr12wr_rq \n
    917
          Needed by loads and stores with pre(post) decrement(increment). \n
    918
          Lower byte of Z pointer will be written into BPR1.
    919
    */
    920
     
    921
     
    922
     
    923
    /*!
    924
    \defgroup pavr_hwres_bpr2 Bypass chain 2
    925
    \ingroup pavr_hwres_bpu
    926
    \par Bypass chain 2 (BPR2) write port connectivity
    927
    \n
    928
    \image html pavr_hwres_bpr2_01.gif
    929
    \n
    930
    \par Requests to BPR2 write port
    931
       - pavr_s5_dacux_bpr12wr_rq \n
    932
          Needed by loads and stores with pre(post) decrement(increment). \n
    933
          Higher byte of X pointer will be written into BPR2.
    934
       - pavr_s5_dacuy_bpr12wr_rq \n
    935
          Needed by loads and stores with pre(post) decrement(increment). \n
    936
          Higher byte of Y pointer will be written into BPR2.
    937
       - pavr_s5_dacuz_bpr12wr_rq
    938
          Needed by loads and stores with pre(post) decrement(increment). \n
    939
          Higher byte of Z pointer will be written into BPR2.
    940
    */
    941
     
    942
     
    943
     
    944
    /*!
    945
    \defgroup pavr_hwres_iof IO File
    946
    \ingroup pavr_hwres
    947
    The IO File is composed of a set of discrete registers, that are grouped into a
    948
    memory-like entity. The IO File has a general write/read port that is
    949
    byte-oriented, and separate read and write ports for each register in the IO
    950
    File. \n
    951
    \n
    952
    \image html pavr_hwres_iof_01.gif
    953
    \n
    954
    Each IO File register is assigned a unique address in the IO space. That address
    955
    is defined in the in the constants definition file
    956
    (`pavr-constants.vhd'). \n
    957
    The IO space is placed in the Unified Memory just above the RF, that is, starting
    958
    with address 32. \n
    959
    The IO addressing space range is 0...63 (Unified Memory addresses 32...95). \n
    960
    Undefined IO registers will read an undefined value. \n
    961
    \n
    962
    */
    963
     
    964
     
    965
     
    966
    /*!
    967
    \defgroup pavr_hwres_iof_gen General IO port
    968
    \ingroup pavr_hwres_iof
    969
    \par General IO File port connectivity
    970
    \n
    971
    \image html pavr_hwres_iof_gen_01.gif
    972
    \n
    973
    The general IO File port is a little bit more ellaborated than a simple read/write
    974
    port. It can read bytes from IO registers to output and write bytes from input to
    975
    IO registers. Also, it can do some bit processing: load bits (from T flag in SREG to
    976
    output), store bits (from input to T bit in SREG), set IO bits, clear IO bits. \n
    977
    An opcode has to be provided to specify one of the actions that this port is
    978
    capable of. \n
    979
    \n
    980
    The following \b opcodes are implemented for the IO File general port:
    981
       - read byte (needed by instructions IN, SBIC, SBIS)
    982
       - write byte (OUT)
    983
       - clear bit (CBI)
    984
       - set bit (SBI)
    985
       - load bit (BLD)
    986
       - store bit (BST)
    987
     
    988
    \par Requests to this port
    989
       - pavr_s5_iof_rq \n
    990
          Needed by instructions that manipulate IO File in stage s5: CBI, SBI, SBIC,
    991
          SBIS, BSET, BCLR, IN, OUT, BLD, BST.
    992
       - pavr_s6_iof_rq \n
    993
          Needed by instructions that manipulate IO File in stage s6: CBI, SBI, BSET,
    994
          BCLR.
    995
       - pavr_s5_dacu_iof_rq \n
    996
          Needed by loads and stores that are decoded by DACU as accessing IO File. \n
    997
    \n
    998
    */
    999
     
    1000
     
    1001
     
    1002
    /*!
    1003
    \defgroup pavr_hwres_iof_sregwr SREG port
    1004
    \ingroup pavr_hwres_iof
    1005
    \par SREG port connectivity
    1006
    \n
    1007
    \image html pavr_hwres_iof_sreg_01.gif
    1008
    \n
    1009
    \par Requests to this port
    1010
       - pavr_s5_alu_sregwr_rq \n
    1011
          This signalizes that an instruction that uses the ALU wants to update the
    1012
          arithmetic flags. \n
    1013
          Flags I (general interrupt enable, SREG(7)) and T (transfer bit, SREG(6))
    1014
          are left unchanged. \n
    1015
       -  pavr_s5_setiflag_sregwr_rq \n
    1016
          This sets the I flag. \n
    1017
          Only RETI instruction needs this.
    1018
       - pavr_s5_clriflag_sregwr_rq \n
    1019
          This clears the I flag. \n
    1020
          No instruction explicitely requests this. \n
    1021
          This is only requested when an interrupt is acknowledged (during the
    1022
          consequent implicit CALL). \n
    1023
    \n
    1024
    */
    1025
     
    1026
     
    1027
     
    1028
    /*!
    1029
    \defgroup pavr_hwres_iof_spwr SP port
    1030
    \ingroup pavr_hwres_iof
    1031
    \par SP port connectivity
    1032
    \n
    1033
    \image html pavr_hwres_iof_sp_01.gif
    1034
    \n
    1035
    This the stack pointer. \n
    1036
    It is 16 bits wide, being composed of two 8 bit registers, SPL and SPH. \n
    1037
    The stack can reside anywhere in the Unified Memory space. That is, anywhere in
    1038
    the RF, IOF or DM. It can even begin, for example, in RF and continue in IOF.
    1039
    However, placing the stack pointer in the IOF is likely to be a programming error,
    1040
    as the IOF registers have dedicated functions. Quasi-random values from stack
    1041
    written into IOF could result, for example, in an unpredictably trigerring any
    1042
    interrupt, and in general, in unpredictable behavior of the controller. \n
    1043
     
    1044
    \par Requests to this port
    1045
       - pavr_s5_inc_spwr_rq \n
    1046
          Increment SP (SPH & SPL) with 1. \n
    1047
          Needed by POP.
    1048
       - pavr_s5_dec_spwr_rq \n
    1049
          Increment SP with 1. \n
    1050
          Needed by PUSH.
    1051
       - pavr_s5_calldec_spwr_rq \n
    1052
          Decrement SP with 1. \n
    1053
          Needed by RCALL, ICALL, EICALL, CALL, interrupt implicit CALL.
    1054
       - pavr_s51_calldec_spwr_rq \n
    1055
          Decrement SP with 1. \n
    1056
          Needed by RCALL, ICALL, EICALL, CALL, interrupt implicit CALL.
    1057
       - pavr_s52_calldec_spwr_rq \n
    1058
          Decrement SP with 1. \n
    1059
          Needed by RCALL, ICALL, EICALL, CALL, interrupt implicit CALL.
    1060
       - pavr_s5_retinc2_spwr_rq \n
    1061
          Increment SP with 2. \n
    1062
          Needed by RET, RETI.
    1063
       - pavr_s51_retinc_spwr_rq \n
    1064
          Increment SP with 1. \n
    1065
          Needed by RET, RETI. \n
    1066
    \n
    1067
    */
    1068
     
    1069
     
    1070
     
    1071
    /*!
    1072
    \defgroup pavr_hwres_iof_rampxwr RAMPX port
    1073
    \ingroup pavr_hwres_iof
    1074
    \par RAMPX port connectivity
    1075
    \n
    1076
    \image html pavr_hwres_iof_rampx_01.gif
    1077
    \n
    1078
    \par Requests to this port
    1079
       - pavr_s5_ldstincrampx_xwr_rq \n
    1080
          Needed by loads and stores with postincrement. \n
    1081
          Only modify RAMPX if the controller has more than 64 KB of Data Mamory.
    1082
       - pavr_s5_ldstdecrampx_xwr_rq \n
    1083
          Needed by loads and stores with predecrement. \n
    1084
          Only modify RAMPX if the controller has more than 64 KB of Data Mamory. \n
    1085
    \n
    1086
    */
    1087
     
    1088
     
    1089
     
    1090
    /*!
    1091
    \defgroup pavr_hwres_iof_rampywr RAMPY port
    1092
    \ingroup pavr_hwres_iof
    1093
    \par RAMPY port connectivity
    1094
    \n
    1095
    \image html pavr_hwres_iof_rampy_01.gif
    1096
    \n
    1097
    \par Requests to this port
    1098
       - pavr_s5_ldstincrampy_xwr_rq \n
    1099
          Needed by loads and stores with postincrement. \n
    1100
          Only modify RAMPY if the controller has more than 64 KB of Data Mamory.
    1101
       - pavr_s5_ldstdecrampy_xwr_rq \n
    1102
          Needed by loads and stores with predecrement. \n
    1103
          Only modify RAMPY if the controller has more than 64 KB of Data Mamory. \n
    1104
    \n
    1105
    */
    1106
     
    1107
     
    1108
     
    1109
    /*!
    1110
    \defgroup pavr_hwres_iof_rampzwr RAMPZ port
    1111
    \ingroup pavr_hwres_iof
    1112
    \par RAMPZ port connectivity
    1113
    \n
    1114
    \image html pavr_hwres_iof_rampz_01.gif
    1115
    \n
    1116
    \par Requests to this port
    1117
       - pavr_s5_ldstincrampz_xwr_rq \n
    1118
          Needed by loads and stores with postincrement. \n
    1119
          Only modify RAMPZ if the controller has more than 64 KB of Data Mamory.
    1120
       - pavr_s5_ldstdecrampz_xwr_rq \n
    1121
          Needed by loads and stores with predecrement. \n
    1122
          Only modify RAMPZ if the controller has more than 64 KB of Data Mamory. \n
    1123
    \n
    1124
    */
    1125
     
    1126
     
    1127
     
    1128
    /*!
    1129
    \defgroup pavr_hwres_iof_rampdwr RAMPD port
    1130
    \ingroup pavr_hwres_iof
    1131
    \par RAMPD port connectivity
    1132
    \n
    1133
    \image html pavr_hwres_iof_rampd_01.gif
    1134
    \n
    1135
    This is a trivial read-only port. \n
    1136
    \n
    1137
    The register RAMPD is used in controllers with more than 64KB of Data Memory,
    1138
    to access the whole Data Memory space. \n
    1139
    RAMPD is used by instructions LDS (LoaD direct from data Space) and STS (STore
    1140
    direct to data Space). In order to get to the desired Data Memory space address,
    1141
    these instructions concatenate RAMPD with a 16 bit constant from the instruction
    1142
    word (RAMPD:k16). \n
    1143
    \n
    1144
    In controllers with less than 64KB of Data Memory, this register is not used. \n
    1145
    The RAMPD register can be written only through the IOF general read and write port. \n
    1146
    No instruction explicitely requests to write this register. \n
    1147
    \n
    1148
    */
    1149
     
    1150
     
    1151
     
    1152
    /*!
    1153
    \defgroup pavr_hwres_iof_eindwr EIND port
    1154
    \ingroup pavr_hwres_iof
    1155
    \par EIND port connectivity
    1156
    \n
    1157
    \image html pavr_hwres_iof_eind_01.gif
    1158
    \n
    1159
    This is a trivial read-only port. \n
    1160
    \n
    1161
    The register EIND is used in controllers with more than 64K words of Program
    1162
    Memory, to access the whole program space. \n
    1163
    EIND is used by instructions EICALL (Extended Indirect CALL) and EIJMP (Extended
    1164
    Indirect JuMP). In order to get to the desired program space address, these
    1165
    instructions concatenate EIND with the Z register (EIND:Z). \n
    1166
    \n
    1167
    In controllers with less than 64K words of Program Memory, this register is not
    1168
    used. \n
    1169
    The EIND register can be written only through the IOF general read and write
    1170
    port. \n
    1171
    No instruction explicitely requests to write this register. \n
    1172
    \n
    1173
    */
    1174
     
    1175
     
    1176
     
    1177
    /*!
    1178
    \defgroup pavr_hwres_iof_perif Peripherals
    1179
    \ingroup pavr_hwres_iof
    1180
    Peripherals are only of secondary importance for this project. \n
    1181
    However, an \ref pavr_hwres_iof_perif_pa  "IO port", an
    1182
    \ref pavr_hwres_iof_perif_int0 "external interrupt" and an
    1183
    \ref pavr_hwres_iof_perif_t0 "8 bit timer" are implemented, to properly test the
    1184
    interrupt system. \n
    1185
    Peripherals have been designed to be \b decoupled from the kernel. They are easily
    1186
    upgradable, without needing to touch the kernel. \n
    1187
    \n
    1188
    */
    1189
     
    1190
     
    1191
     
    1192
    /*!
    1193
    \defgroup pavr_hwres_iof_perif_pa Port A
    1194
    \ingroup pavr_hwres_iof_perif
    1195
    \par Port A structure
    1196
    The port A offers 8 bidirectional general purpose IO lines. \n
    1197
    Lines 0 and 1 also have alternate functions:
    1198
    1199
       
  • line 0 can be used as \ref pavr_hwres_iof_perif_int0 "external interrupt 0" input
  • 1200
       
  • line 1 can be used as \ref pavr_hwres_iof_perif_t0 "timer 0" clock input.
  • 1201
    1202
    \n
    1203
    Port A is managed through 3 IO File locations: \b PORTA, \b DDRA and \b PINA. \n
    1204
    \b DDRA sets each pin's direction: DDRA(i)=0 means that line i is input,
    1205
    DDRA(i)=1 means that line i is output. \n
    1206
    When writing a value to the port, that value goes into \b PORTA. If DDRA configures
    1207
    the corresponding lines as outputs, the contents of PORTA will be available on
    1208
    external  pins. However, if DDRA configures the lines as inputs (DDRA(i)=0), then:
    1209
    1210
       
  • if PORTA(i)=0, the line i is `pure' input (High Z). \n
  • 1211
       
  • if PORTA(i)=1, the line i is an input weakly pulled high. \n
  • 1212
    1213
    \b PINA reads the physical value of external lines, rather than PORTA. \n
    1214
    \par Port A schematics
    1215
    \n
    1216
    \image html pavr_hwres_iof_perif_pa_01.gif
    1217
    \n
    1218
    \n
    1219
    */
    1220
     
    1221
     
    1222
    /*!
    1223
    \defgroup pavr_hwres_iof_perif_int0 External interrupt 0
    1224
    \ingroup pavr_hwres_iof_perif
    1225
    \par Features
    1226
    External interrupt 0 is physically mapped on the line 0 (bit 0) of
    1227
    \ref pavr_hwres_iof_perif_pa "port A". \n
    1228
    \n
    1229
    Its associated interrupt flag resides into the IO File register GIFR (General
    1230
    Interrupt Flags Register): \n
    1231
    \n
    1232
    \image html pavr_hwres_iof_perif_int0_01.gif
    1233
    \n
    1234
    External interrupt 0 is enabled/disabled by setting/clearing bit 6 in GIMSK
    1235
    (General Interrupt Mask) register: \n
    1236
    \n
    1237
    \image html pavr_hwres_iof_perif_int0_02.gif
    1238
    \n
    1239
    If enabled, it can trigger an interrupt on high-to-low transition, low-to-high
    1240
    transition, or on a low level of the interrupt 0 input. This behavior is defined
    1241
    by 2 bits in the MCUCR (Microcontroller Control) register: \n
    1242
    \n
    1243
    \image html pavr_hwres_iof_perif_int0_03.gif
    1244
    \n
    1245
    */
    1246
     
    1247
     
    1248
    /*!
    1249
    \defgroup pavr_hwres_iof_perif_t0 Timer 0
    1250
    \ingroup pavr_hwres_iof_perif
    1251
    \par Features
    1252
    The IO File register that holds the current count is TCNT0. \n
    1253
    Its behavior is controlled by a set of other IO File registers:
    1254
       - TIFR (Timer Interrupt Flag Register) holds the Timer 0 interrupt flag: \n
    1255
          \n
    1256
          \image html pavr_hwres_iof_perif_t0_02.gif
    1257
          \n
    1258
       - TIMSK (Timer Interrupt Mask) contains the flag that enables/disables Timer 0
    1259
          interrupt: \n
    1260
          \n
    1261
          \image html pavr_hwres_iof_perif_t0_03.gif
    1262
          \n
    1263
       - TCCR0 (Timer 0 Control Register) register defines the prescaling source of
    1264
          Timer 0. \n
    1265
          When external input pin is selected, Timer 0 clock source will be the line 0
    1266
          of \ref pavr_hwres_iof_perif_pa "port A": \n
    1267
          \n
    1268
          \image html pavr_hwres_iof_perif_t0_01.gif
    1269
          \n
    1270
    \n
    1271
    */
    1272
     
    1273
     
    1274
     
    1275
     
    1276
    /*!
    1277
    \defgroup pavr_hwres_alu ALU
    1278
    \ingroup pavr_hwres
    1279
    \par ALU connectivity:
    1280
    \n
    1281
    \image html pavr_hwres_alu_01.gif
    1282
    \n
    1283
    \ref alu_pipe_ref_01 "Here" it can be seen how the ALU plugs into the pipeline. \n
    1284
    \n
    1285
    The ALU is a 100% \b combinational device. \n
    1286
    It accepts 2 operands:
    1287
    1288
       
  • a 16 bit operand \n
  • 1289
          This is taken through the Bypass Unit.
    1290
       
  • an 8 bit operand \n
  • 1291
          This is taken through the Bypass Unit.
    1292
    1293
    The ALU output is 16 bits wide. \n
    1294
     
    1295
    \par ALU opcodes:
    1296
       - NOP \n
    1297
       - OP1 \n
    1298
          Transfers operand 1 directly to the ALU output. \n
    1299
         OP2 \n
    1300
          Transfers operand 2 directly to the lower 8 bits of ALU output. \n
    1301
       -  ADD8 \n
    1302
          ADC8 \n
    1303
          Adds with carry lower 8 bits of operand 1 with operand 2. \n
    1304
          SUB8 \n
    1305
          SBC8 \n
    1306
       -  AND8 \n
    1307
          EOR8 \n
    1308
          OR8 \n
    1309
       -  INC8 \n
    1310
          DEC8 \n
    1311
       -  COM8 \n
    1312
          NEG8 \n
    1313
          SWAP8 \n
    1314
       -  LSR8 \n
    1315
          ASR8 \n
    1316
          ROR8 \n
    1317
       -  ADD16 \n
    1318
          Adds without carry operand 1 with operand 2 sign extended to 16 bits. \n
    1319
          SUB16 \n
    1320
       -  MUL8 \n
    1321
          MULS8 \n
    1322
          MULSU8 \n
    1323
          FMUL8 \n
    1324
          FMULS8 \n
    1325
          FMULSU8 \n
    1326
     
    1327
    \par ALU flags:
    1328
    - H (half carry)
    1329
    - S (sign)
    1330
    - V (two's complement)
    1331
    - N (negative)
    1332
    - Z (zero)
    1333
    - C (carry)
    1334
    */
    1335
     
    1336
     
    1337
     
    1338
     
    1339
    /*!
    1340
    \defgroup pavr_hwres_dacu DACU
    1341
    \ingroup pavr_hwres
    1342
    \par Overview
    1343
    The Data Address Calculation Unit offers a unified read and write access over the
    1344
    concatenated RF, IOF and DM space, that is, over the Unified Memory (UM) space. \n
    1345
    Loads and stores operate in the UM space. They use the DACU in order to translate
    1346
    the Unified Memory address into a RF, IOF or DM address. \n
    1347
    The DACU takes requests to read or write into UM space, translates the UM address
    1348
    into RF, IOF or DM address, and transparently places requests to read or write
    1349
    the specific hardware resource (RF, IOF or DM) that corresponds to the given UM
    1350
    address. \n
    1351
     
    1352
    \par Reading DACU
    1353
    \n
    1354
    \image html pavr_hwres_dacu_01.gif
    1355
    \n
    1356
    DACU read \b requests:
    1357
    1358
       
  • pavr_s5_x_dacurd_rq
  • 1359
             Needed by loads from address given by X pointer register.
    1360
       
  • pavr_s5_y_dacurd_rq
  • 1361
             Needed by loads from address given by Y pointer register.
    1362
       
  • pavr_s5_z_dacurd_rq
  • 1363
             Needed by loads from address given by Z pointer register.
    1364
       
  • pavr_s5_sp_dacurd_rq
  • 1365
             Needed by POP instruction.
    1366
       
  • pavr_s5_k16_dacurd_rq
  • 1367
             Needed by LDS instruction. 
    1368
             If the controller has more than 64KB of Data Memory, the Unified Memory
    1369
             address is built by concatenating the RAMPD with the 16 bit constant.
    1370
       
  • pavr_s5_pchi8_dacurd_rq
  • 1371
             The higher 8 bits of the PC are loaded from the stack. 
    1372
             Needed by RET and RETI instructions.
    1373
       
  • pavr_s51_pcmid8_dacurd_rq
  • 1374
             The middle 8 bits of the PC are loaded from the stack. 
    1375
             Needed by RET and RETI instructions.
    1376
       
  • pavr_s52_pclo8_dacurd_rq
  • 1377
             The lower 8 bits of the PC are loaded from the stack. 
    1378
             Needed by RET and RETI instructions.
    1379
    1380
    \n
    1381
    As a response to read requests, the DACU places read \b requests to RF, IOF or DM:
    1382
    1383
       
  • pavr_s5_dacu_rfrd1_rq
  • 1384
       
  • pavr_s5_dacu_iof_rq
  • 1385
       
  • pavr_s5_dacu_dmrd_rq
  • 1386
    1387
    \par Writing DACU
    1388
    \n
    1389
    \image html pavr_hwres_dacu_02.gif
    1390
    \n
    1391
    DACU write \b requests:
    1392
    1393
       
  • pavr_s5_x_dacuwr_rq
  • 1394
             Needed by stores to address given by X pointer register.
    1395
       
  • pavr_s5_y_dacuwr_rq
  • 1396
             Needed by stores to address given by Y pointer register.
    1397
       
  • pavr_s5_z_dacuwr_rq
  • 1398
             Needed by stores to address given by Z pointer register.
    1399
       
  • pavr_s5_sp_dacuwr_rq
  • 1400
             Needed by PUSH instruction. 
    1401
       
  • pavr_s5_k16_dacuwr_rq
  • 1402
             Needed by STS instruction. 
    1403
             If the controller has more than 64KB of Data Memory, the Unified Memory
    1404
             address is built by concatenating the RAMPD with the 16 bit constant.
    1405
       
  • pavr_s5_pclo8_dacuwr_rq
  • 1406
             The lower 8 bits of the PC are stored on the stack. 
    1407
             Needed by CALL family instructions (CALL, RCALL, ICALL, EICALL, implicit
    1408
             interrupt CALL).
    1409
       
  • pavr_s51_pcmid8_dacuwr_rq
  • 1410
             The middle 8 bits of the PC are stored on the stack. 
    1411
             Needed by CALL family instructions.
    1412
       
  • pavr_s52_pchi8_dacuwr_rq
  • 1413
             The higher 8 bits of the PC are stored on the stack. 
    1414
             Needed by CALL family instructions.
    1415
    1416
    \n
    1417
    As a response to write requests, the DACU places write \b requests to RF, IOF or DM,
    1418
       and BPU:
    1419
    1420
       
  • pavr_s5_dacu_rfwr_rq
  • 1421
       
  • pavr_s5_dacu_iof_rq
  • 1422
       
  • pavr_s5_dacu_dmwr_rq
  • 1423
       
  • pavr_s5_dacust_bpr0wr_rq
  • 1424
    1425
    \n
    1426
    */
    1427
     
    1428
     
    1429
     
    1430
    /*!
    1431
    \defgroup pavr_hwres_dm Data Memory
    1432
    \ingroup pavr_hwres
    1433
    \par Data Memory connectivity
    1434
    \n
    1435
    \image html pavr_hwres_dm_01.gif
    1436
    \n
    1437
    The Data Memory is a single port RAM. \n
    1438
    That port provides both read and write DM accesses. \n
    1439
    The DM is organized on bytes, and has the length set by a constant in the
    1440
    constants definition file (`pavr-constants.vhd'). \n
    1441
    \n
    1442
    \par Requests to DM
    1443
    Requests to access DM come only from the DACU: \n
    1444
       - pavr_s5_dacu_dmrd_rq \n
    1445
       - pavr_s5_dacu_dmwr_rq \n
    1446
    \n
    1447
    */
    1448
     
    1449
     
    1450
     
    1451
    /*!
    1452
    \defgroup pavr_hwres_pm Program Memory
    1453
    \ingroup pavr_hwres
    1454
    \par PM handling
    1455
    \n
    1456
    \image html pavr_hwres_pm_01.gif
    1457
    \n
    1458
    The Program Memory is a single port RAM. \n
    1459
    That port provides read-only access. Support for the instruction SPM (Store
    1460
    Program Memory) is currently not provided. \n
    1461
    The PM is organized on 16 bit words, and has the length set by a constant in the
    1462
    constants definition file (`pavr-constants.vhd'). \n
    1463
    \n
    1464
    Apart from controlling the Program Memory, the PM manager also controls the Program
    1465
    Counter. \n
    1466
    Some PM access requests need to modify the PC, others don't. The only PM requests
    1467
    that don't modify the PC are the loads from PM (LPM and ELPM instructions). The
    1468
    other requests correspond to instructions that want to modify the instruction
    1469
    flow, thus modify the PC (jumps, branches, calls and returns). \n
    1470
     
    1471
    \par Program Counter handling
    1472
    At a given time, the pipeline can process more than one instruction. Up to 6
    1473
    instructions can be simultaneousely processed. Obviousely, each of these
    1474
    instructions has its own address in the PM. \n
    1475
    One may ask: how is defined the Program Counter, as long as two or more instructions
    1476
    are simultaneousely executed? Whose address is considered to be the Program Counter? \n
    1477
    The answer is: the Program Counter is in fact composed of a set of registers. Each
    1478
    instruction in the pipeline has an associated Program Counter that follows it
    1479
    while flowing through the pipeline. Implementation details can be found in the
    1480
    description of \ref pavr_pipeline_jumps "jumps", \ref pavr_pipeline_branches "branches",
    1481
    \ref pavr_pipeline_skips "skips", \ref pavr_pipeline_calls "calls" and
    1482
    \ref pavr_pipeline_returns "returns". \n
    1483
    As an example, when a relative jump computes the target address, it considers its
    1484
    own Program Counter rather than the address of the instruction fetched that
    1485
    moment from the PM.
    1486
    The instructions that modify the instruction flow (jumps, branches, skips, calls
    1487
    and returns) must be able to manipulate the program counters associated with
    1488
    pipeline stages s1, s2 and s3. However, this is done not directly, but via the
    1489
    Program Memory manager. The PM manager centralizes all instruction flow access
    1490
    requests (jump requests, branch requests, etc) and takes care of the program
    1491
    counters in an organized and manageable manner. \n
    1492
     
    1493
    \par Requests to PM
    1494
       - pavr_s5_lpm_pm_rq \n
    1495
          Needed by LPM instruction. \n
    1496
          This request doesn't modify the instruction flow. \n
    1497
       - pavr_s5_elpm_pm_rq \n
    1498
          Needed by ELPM instruction. \n
    1499
          This request doesn't modify the instruction flow. \n
    1500
       - pavr_s4_z_pm_rq \n
    1501
          Needed by ICALL and IJMP. \n
    1502
       - pavr_s4_zeind_pm_rq \n
    1503
          Needed by EICALL and EIJMP. \n
    1504
       - pavr_s4_k22abs_pm_rq \n
    1505
          Needed by CALL and JMP. \n
    1506
          To get to the jump address, the 16 bit instruction constant is concatenated
    1507
          with a 6 bit constant previousely read also from the instruction opcode.
    1508
       - pavr_s4_k12rel_pm_rq \n
    1509
          Needed by RCALL and RJMP. \n
    1510
          Note that pavr_s4_pc is a pipeline register that holds the Program Memory
    1511
          address of the instruction executing in pipeline stage s4. \n
    1512
          Because the relative jump actually occurs in stage s4, pavr_s4_pc is needed
    1513
          rather than the current Program Counter (pavr_pc).
    1514
       - pavr_s6_branch_pm_rq \n
    1515
          Needed by branch instructions (BRBC and BRBS). \n
    1516
       - pavr_s6_skip_pm_rq \n
    1517
          Needed by some skip instructions (CPSE, SBRC and SBRS). \n
    1518
       - pavr_s61_skip_pm_rq \n
    1519
          Needed by some skip instructions (SBIC and SBIS). \n
    1520
       - pavr_s4_k22int_pm_rq \n
    1521
          Needed by implicit interrupt CALL. \n
    1522
       - pavr_s54_ret_pm_rq \n
    1523
          Needed by RET and RETI. \n
    1524
    \n
    1525
    */
    1526
     
    1527
     
    1528
     
    1529
    /*!
    1530
    \defgroup pavr_hwres_sfu Stall and Flush Unit
    1531
    \ingroup pavr_hwres
    1532
    The pipeline controls its own stall and flush status, through specific stall and
    1533
       flush-related request signals. These requests are sent to the Stall and Flush
    1534
       Unit (SFU). The output of the SFU is a set of signals that directly control
    1535
       pipeline stages (a stall and flush control signals pair for each stage): \n
    1536
    \n
    1537
    \image html pavr_hwres_sfu_01.gif
    1538
    \n
    1539
    \par Requests to SFU
    1540
       - stall requests \n
    1541
          The SFU stalls \b all younger stages. However, by stalling-only, the
    1542
             current instruction is spawned into 2 instances. One of them must
    1543
             be killed (flushed). The the younger instance is killed (the
    1544
             previous stage is flushed). \n
    1545
          Thus, a nop is introduced in the pipeline \b  before the instruction
    1546
             wavefront. \n
    1547
          If more than one stage request a stall at the same time, the older
    1548
             one has priority (the younger one will be stalled along with the
    1549
             others). Only after that, the younger one will be ackowledged its
    1550
             stall by means of appropriate stall and flush control signals. \n
    1551
          Stall \b requests:
    1552
          - pavr_s3_stall_rq \n
    1553
          - pavr_s5_stall_rq \n
    1554
          - pavr_s6_stall_rq
    1555
       - flush requests \n
    1556
          The SFU simply flushes that stage. \n
    1557
          More than one flush could be acknolewdged at the same time, without
    1558
             competition. However, all flush requests happen to request to flush the
    1559
             same pipeline stage, s2. \n
    1560
          Flush \b requests:
    1561
          - pavr_s3_flush_s2_rq \n
    1562
          - pavr_s4_flush_s2_rq \n
    1563
          - pavr_s4_ret_flush_s2_rq \n
    1564
          - pavr_s5_ret_flush_s2_rq \n
    1565
          - pavr_s51_ret_flush_s2_rq \n
    1566
          - pavr_s52_ret_flush_s2_rq \n
    1567
          - pavr_s53_ret_flush_s2_rq \n
    1568
          - pavr_s54_ret_flush_s2_rq \n
    1569
          - pavr_s55_ret_flush_s2_rq \n
    1570
       - branch requests \n
    1571
          The SFU flushes stages s2...s5, because the corresponding instructions were
    1572
          already uselessly fetched, and requests the PC to be loaded with the branch
    1573
          relative jump address. \n
    1574
          Branch \b requests:
    1575
          - pavr_s6_branch_rq \n
    1576
       - skip requests \n
    1577
          The SFU treats skips as branches that have the relative jump address equal to
    1578
          0, 1 or 2, depending on the skip condition and on next instruction's length
    1579
          (16/32 bits). \n
    1580
          Skip \b requests:
    1581
          - pavr_s6_skip_rq \n
    1582
          - pavr_s61_skip_rq
    1583
       - nop requests \n
    1584
          The SFU stalls all younger instructions. The current instruction is
    1585
             spawned into 2 instances. The older instance is killed (the very
    1586
             same stage that requested the nop stage is flushed). \n
    1587
          Thus, a nop is introduced in the pipeline \b  after the instruction
    1588
             wavefront. \n
    1589
          In order to do that, a micro-state machine is needed outside the
    1590
             pipeline, because otherwise that stage would undefinitely stall
    1591
             itself. \n
    1592
          Nop \b requests: \n
    1593
          - pavr_s4_nop_rq \n
    1594
    \par SFU control signals
    1595
    Each main pipeline stage (s1-s6) has 2 kinds of control signals, that are generated by
    1596
       the SFU:
    1597
    1598
       
  • stall control \n
  • 1599
          All registers in this stage are instructed to remain unchanged
    1600
             All possible requests to hardware resources (such as RF, IOF, BPU,
    1601
             DACU, SREG, etc) are reseted (to 0).
    1602
       
  • flush control \n
  • 1603
          All registers in this stage are reseted (to 0), to a most "benign"
    1604
             state (a nop). Also, all requests to hardware resources are
    1605
             reseted.
    1606
    1607
    \n
    1608
    Each main pipeline stage has an associated flag that determines whether or not
    1609
    that stage has the right to access hardware resources. These flags are also
    1610
    managed by the SFU. \n
    1611
    Hardware resources enabling flags:
    1612
       - pavr_s1_hwrq_en
    1613
       - pavr_s2_hwrq_en
    1614
       - pavr_s3_hwrq_en
    1615
       - pavr_s4_hwrq_en
    1616
       - pavr_s5_hwrq_en
    1617
       - pavr_s6_hwrq_en
    1618
    */
    1619
     
    1620
     
    1621
     
    1622
    /*!
    1623
    \defgroup pavr_pipeline Pipeline details
    1624
    \ingroup pavr_implementation
    1625
    */
    1626
     
    1627
     
    1628
     
    1629
    /*!
    1630
    \defgroup pavr_pipeline_alu ALU
    1631
    \ingroup pavr_pipeline
    1632
    \par ALU description
    1633
    The ALU is not a potentially conflicting resource, as it is fully controlled by
    1634
    pipeline stage s5. \n
    1635
    \n
    1636
    There are two ALU operands. The first operand is taken either from RF read port 1,
    1637
    if it's an 8 bit operand, or taken from RF read port 1 (lower 8 bits) and from RF
    1638
    read port 2 (higher 8 bits), if it's a 16 bit operand. The second operand is taken
    1639
    either from the RF read port 2 or directly from the instruction opcode; it is
    1640
    always 8 bit-wide. \n
    1641
    Both operands are fed to the ALU through the Bypass Unit. \n
    1642
    All ALU-requiring instructions write their result into the Bypass Unit. \n
    1643
    Details about the ALU hardware resource (connectivity, ALU opcodes) can be found
    1644
    \ref pavr_hwres_alu "here". \n
    1645
    Instructions that make use of the ALU-related pipeline registers:
    1646
       - ADD, ADC, ADIW
    1647
       - SUB, SUBI, SBC, SBCI, SBIW
    1648
       - INC, DEC
    1649
       - AND, ANDI
    1650
       - OR, ORI, EOR
    1651
       - COM, NEG, CP, CPC, CPI, SWAP
    1652
       - LSR, ROR, ASR
    1653
       - MUL, MULS, MULSU
    1654
       - FMUL, FMULS, FMULSU
    1655
       - MOV, MOVW
    1656
     
    1657
    \par Plugging the ALU into the pipeline
    1658
    The pipeline registers related to ALU access are presented in the picture below. \n
    1659
    From this picture, it can also easely figured out instructions' timing. \n
    1660
    \anchor alu_pipe_ref_01
    1661
    \n
    1662
    \image html pavr_pipe_alu_01.gif
    1663
    \n
    1664
    */
    1665
     
    1666
     
    1667
    /*!
    1668
    \defgroup pavr_pipeline_iof IOF access
    1669
    \ingroup pavr_pipeline
    1670
    \par A few details
    1671
    The IO File is accessed during stages s5 or/and s6. \n
    1672
    As presented \ref pavr_hwres_iof_gen "here", the IO File can do more than
    1673
    byte-oriented read-write operations. It can also do bit processing. \n
    1674
    The following data is provided to the IOF, for each pipeline stage in which IOF
    1675
       access is required:
    1676
       - byte address
    1677
       - bit address
    1678
       - opcode
    1679
       - byte data in
    1680
     
    1681
    \par Accessing the IOF
    1682
    Main pipeline registers that implement IOF accessing instructions are presented
    1683
    here: \n
    1684
    \n
    1685
    \image html pavr_pipe_iof_01.gif
    1686
    \n
    1687
    */
    1688
     
    1689
     
    1690
    /*!
    1691
    \defgroup pavr_pipeline_dacu DACU access
    1692
    \ingroup pavr_pipeline
    1693
    \par A few details
    1694
    The Data Address Calculation Unit (DACU) handles the Unified Memory, by mapping
    1695
    Unified Memory addresses into Register File, IO File or Data Memory addresses. \n
    1696
    It also transparently places access requests to RF, IOF or DM, as response to UM
    1697
    access requests. \n
    1698
    More details on DACU requests can be found \ref pavr_hwres_dacu "here". \n
    1699
    \par Plugging the DACU into the pipeline
    1700
    \n
    1701
    \image html pavr_pipe_dacu_01.gif
    1702
    \n
    1703
    */
    1704
     
    1705
     
    1706
    /*!
    1707
    \defgroup pavr_pipeline_jumps Jumps
    1708
    \ingroup pavr_pipeline
    1709
    \par A few details
    1710
    There are 4 jump instructions:
    1711
    1712
       
  • RJMP (relative jump) \n
  • 1713
          The jump address is obtained by adding to the current Program Counter a 12
    1714
          bit signed offset obtained from the instruction word.
    1715
       
  • IJMP (indirect jump) \n
  • 1716
          The jump address is read from the Z pointer register. \n
    1717
          The jump destination resides in the lower 64 Kwords of Program Memory. \n
    1718
       
  • EIJMP (extended indirect jump) \n
  • 1719
          The jump address is read from EIND:Z (higher 6 bis from EIND register in IOF,
    1720
          and lower 16 bits from Z pointer in RF). \n
    1721
          This jump accesses the whole 22 bit addressing space of the Program Memory. \n
    1722
       
  • JMP (long jump) \n
  • 1723
          The jump address is read from two consecutive instruction words. \n
    1724
          This jump accesses the whole 22 bit addressing space of the Program Memory. \n
    1725
    1726
    \n
    1727
    When a jump is detected into the pipeline, next two instructions (that were already
    1728
    uselessly fetched from the Program Memory) are flushed. Then, the Program Memory
    1729
    manager is asked permission to access the Program Memory and to modify the
    1730
    instruction flow (modify the Program Counter). \n
    1731
    After that, unless it gets flushed or stalled by an older instruction, the jump
    1732
    instruction will configure the pipeline to fetch from the new PM address. \n
    1733
    \n
    1734
    RJMP and JMP take 3 clocks, while IJMP and EIJMP take 4 clocks. \n
    1735
     
    1736
    \par Jump state machine
    1737
    \n
    1738
    \image html pavr_pipe_jumps_01.gif
    1739
    \n
    1740
    */
    1741
     
    1742
     
    1743
    /*!
    1744
    \defgroup pavr_pipeline_branches Branches
    1745
    \ingroup pavr_pipeline
    1746
    \par A few details
    1747
    The branches condition a 7 bit relative jump by the value of a bit in the Status
    1748
    Register. \n
    1749
    If the branch condition is not met, no further action is taken. However, if the
    1750
    branch condition is evaluated as true, then all previous stages are flushed and
    1751
    the Stall and Flush Unit is requestd a branch. The SFU, in turn, asks the PM
    1752
    manager permission to access the Program Memory and modify the program flow. \n
    1753
    Branches take place in stage s6. \n
    1754
    \n
    1755
    Not taken branches take 2 clocks, while taken branches take 4 clocks. \n
    1756
     
    1757
    \par Branch state machine
    1758
    \n
    1759
    \image html pavr_pipe_branches_01.gif
    1760
    \n
    1761
    */
    1762
     
    1763
     
    1764
    /*!
    1765
    \defgroup pavr_pipeline_skips Skips
    1766
    \ingroup pavr_pipeline
    1767
    \par A few details
    1768
    Skips are implemented as branches that have the relative target address equal to
    1769
    0, 1 or 2, depending on the skip condition and on whether the following
    1770
    instruction has 16 or 32 bits. \n
    1771
    There are two kinds of skips: one category that makes the skip request in stage
    1772
    s6 (the same as branches), and one that requests skip in s61. The first category
    1773
    includes instructions CPSE (Compare registers and skip if equal), SBRC and SBRS
    1774
    (skip if bit in register is cleared/set). The second category includes SBIC, SBIS
    1775
    (Skip if bit in IO register is cleared/set). \n
    1776
    \n
    1777
    CPSE, SBRC and SBRS take 2 clocks if not taken, and 4 clocks if taken. \n
    1778
    SBIC and SBIS take 3 clocks if not taken, and 5 clocks if taken. \n
    1779
     
    1780
    \par Skip state machine
    1781
    \n
    1782
    \image html pavr_pipe_skips_01.gif
    1783
    \n
    1784
    */
    1785
     
    1786
     
    1787
    /*!
    1788
    \defgroup pavr_pipeline_calls Calls
    1789
    \ingroup pavr_pipeline
    1790
    \par A few details
    1791
    There are 4 call instructions, analogue to the \ref pavr_pipeline_jumps "jump"
    1792
    instructions:
    1793
    1794
       
  • RCALL (relative call) \n
  • 1795
          The call address is obtained by adding to the current Program Counter a 12
    1796
          bit signed offset obtained from the instruction word.
    1797
       
  • ICALL (indirect call) \n
  • 1798
          The call address is read from the Z pointer register. \n
    1799
          The destination resides in the lower 64 Kwords of Program Memory. \n
    1800
       
  • EICALL (extended indirect call) \n
  • 1801
          The call address is read from EIND:Z (higher 6 bis from EIND register in IOF,
    1802
          and lower 16 bits from Z pointer in RF). \n
    1803
          This call accesses the whole 22 bit addressing space of the Program Memory. \n
    1804
       
  • CALL (far call) \n
  • 1805
          The call address is read from two consecutive instruction words. \n
    1806
          This call accesses the whole 22 bit addressing space of the Program Memory. \n
    1807
    1808
    \n
    1809
    Apart from these, there is another kind of call, automatically inserted into the
    1810
    pipeline when an interrupt is processed. In addition to the regular calls, the
    1811
    implicit interrupt call also clears the general interrupt flag (flag I in the
    1812
    Status Register). This way, nested interrupts are disabled by default. However,
    1813
    they can be enabled explicitely. This behavior is questionable, but is implemented
    1814
    for the sake of AVR compatibility. \n
    1815
    After an interrupt generates an implicit call, further interrupts are disabled for
    1816
    4 clocks. This way, at least one instruction will be executed fron the called
    1817
    subroutine. Only after that, another interrupt can change the instruction flow. \n
    1818
    \n
    1819
    All calls take 4 clocks. \n
    1820
     
    1821
    \par Call state machine
    1822
    \n
    1823
    \image html pavr_pipe_calls_01.gif
    1824
    \n
    1825
    */
    1826
     
    1827
     
    1828
    /*!
    1829
    \defgroup pavr_pipeline_returns Returns
    1830
    \ingroup pavr_pipeline
    1831
    \par A few details
    1832
    There are two kinds of returns:
    1833
    1834
       
  • RET \n
  • 1835
          Return from subroutine. \n
    1836
          The Program Counter is loaded with the return address (22 bit wide) read
    1837
          from the stack, and the Stack Pointer is incremented by 3.
    1838
       
  • RETI \n
  • 1839
          The same as RET, but in addition sets the general interrupt flag (flag I in
    1840
          the Status Register).
    1841
    1842
    \n
    1843
    Returns are the slowest instructions in the pAVR implementation of the AVR
    1844
    instruction set. They take 9 clocks. \n
    1845
    First 2 clocks are spent while waiting the previous instructions to write the
    1846
    Unified Memory. Next 5 clocks, the Program Counter is read from the Unified
    1847
    Memory. In a future version, this part might take only 4 clocks. Finally,
    1848
    another 2 clocks are spent while bringing the target instruction into the
    1849
    instruction register. \n
    1850
    \n
    1851
    \image html pavr_pipe_returns_01.gif
    1852
    \n
    1853
    */
    1854
     
    1855
     
    1856
     
    1857
    /*!
    1858
    \defgroup pavr_pipeline_int Interrupts
    1859
    \ingroup pavr_pipeline
    1860
    \par General
    1861
    The Interrupt System can forcedly place calls into the pipeline stage s3, as a
    1862
    result of specific IO activity. \n
    1863
    \n
    1864
    \par Implementation
    1865
    The core of the Interrupt System is the Interrupt Manager module. It prioritizes
    1866
    the interrupt sources, checks if interrupts are enabled and if the pipeline is
    1867
    ready to process interrupts, and finally sends interrupt requests to the
    1868
    pipeline, together with the associated interrupt vector and other pipeline
    1869
    control signals. \n
    1870
    \n
    1871
    \image html pavr_hwres_int_01.gif
    1872
    \n
    1873
    The pipeline acknowledges interrupt requests by forcing the Instruction Decoder
    1874
    to decode a call instruction, with the absolute jump address given by the
    1875
    Interrupt Manager. Next 2 instructions, that were already uselessly fetched, are
    1876
    flushed. \n
    1877
    \n
    1878
    The interrupt vectors are parameterized, and can be placed anywhere in the
    1879
    Program Memory. \n
    1880
    Every interrupt has a parameterized priority. \n
    1881
    In the present implementation, up to 32 interrupt sources are handled. \n
    1882
    2 interrupt sources are implemented:
    1883
    \ref pavr_hwres_iof_perif_int0 "external interrupt 0" and
    1884
    \ref pavr_hwres_iof_perif_t0 "timer 0" interrupt. \n
    1885
    \n
    1886
    Because the Interrupt Manger shares much with the IO File, it is not
    1887
    built as a separate entity, but rather embedded into the IO File. The Interrupt
    1888
    Manager might be implemented as separate entity in a future version of pAVR. \n
    1889
    \n
    1890
    The interrupt latency is 5 clocks (1 clock needed by the interrupt manager and
    1891
    4 clocks needed by the implicit call).
    1892
    \n
    1893
    */
    1894
     
    1895
     
    1896
    /*!
    1897
    \defgroup pavr_pipeline_others Others
    1898
    \ingroup pavr_pipeline
    1899
    \par LPM/ELPM state machine
    1900
    This is how the LPM/ELPM state machine plugs into the pipeline: \n
    1901
    \n
    1902
    \image html pavr_pipe_others_01.gif
    1903
    \n
    1904
    */
    1905
     
    1906
     
    1907
     
    1908
    /*!
    1909
    \defgroup pavr_test Testing
    1910
    \par Testing strategy
    1911
    When testing a certain entity, the following \b testing \b strategy was adopted:
    1912
    1913
       
  • embed that entity into a larger one that also includes all other
  • 1914
       ingredients needed for a real-life simulation of the tested entity. Typical
    1915
       such `other ingredients' are RAMs and multiplexers.
    1916
       
  • run custom VHDL tests that test as much as possible of the functionality
  • 1917
       of the device under test. Extreme cases are the first situations to be tested.
    1918
    1919
    \n
    1920
    Two kinds of tests were conducted on pAVR:
    1921
    1922
       
  • every module of pAVR was separately tested as described in the testing
  • 1923
       strategy above.
    1924
       
  • pAVR as whole was tested as described in the testing strategy above.
  • 1925
    1926
    \n
    1927
     
    1928
    \par Testing pAVR modules
    1929
    Each pAVR module was separately tested. \n
    1930
    The particular tests carried out are presented below, grouped by the entities
    1931
    under test:
    1932
    1933
       
  • \b utilities defined in `std_util.vhd' \n
  • 1934
          The associated test file is `test_std_util.vhd'. \n
    1935
          The utilities defined in `std_util.vhd' here are:
    1936
          
    1937
             
  • type conversion routines often used throughout the other source files
  • 1938
                in this project
    1939
             
  • basic arithmetic functions
  • 1940
             
  • sign and zero-extend functions \n
  • 1941
                Both are tested in `test_std_util.vhd'. \n
    1942
                Extreme cases and typical cases are considered. \n
    1943
             
  • vector comparision function \n
  • 1944
                Tested in `test_std_util.vhd'. \n
    1945
                Extreme cases and typical cases are considered. \n
    1946
          
    1947
       
  • \b ALU \n
  • 1948
          The associated tests are defined in `test_pavr_alu.vhd'. They consist of
    1949
          checking the ALU output and flags output for all ALU opcodes, one by one,
    1950
          for all of these situations:
    1951
          
    1952
             
  • carry in = 0
  • 1953
             
  • carry in = 1
  • 1954
             
  • additions generate overflow
  • 1955
             
  • substractions generate overflow
  • 1956
          
    1957
          There are 26 ALU opcodes to be checked for each situation.
    1958
       
  • \b Register \b File \n
  • 1959
          The associated tests are defined in `test_pavr_register_file.vhd'. \n
    1960
          The following tests are done:
    1961
          
    1962
             
  • read all ports, one at a time
  • 1963
                
    1964
                   
  • read port 1 (RFRD1)
  • 1965
                   
  • read port 2 (RFRD2)
  • 1966
                   
  • write port (RFWR)
  • 1967
                   
  • write pointer register X (RFXWR)
  • 1968
                   
  • write pointer register Y (RFYWR)
  • 1969
                   
  • write pointer register Z (RFZWR)
  • 1970
                
    1971
             
  • combined RFRD1, RFRD2, RFWR \n
  • 1972
                They should work simultaneousely.
    1973
             
  • combined RFXWR, RFYWR, RFZWR \n
  • 1974
                They should work simultaneousely.
    1975
             
  • combined RFRD1, RFRD2, RFWR, RFXWR, RFYWR, RFZWR \n
  • 1976
                That is, all RF ports are accessed simultaneousely. They should
    1977
                do their job. \n
    1978
                However, note that the pointer registers are accessible for writting
    1979
                by their own ports but also by the RF write port. Writing them via
    1980
                pointer register write ports overwrites writing via general write
    1981
                port. Even though concurrent writing could happen in a perfectly legal
    1982
                AVR implementation, AVR's behavior is unpredictible (what write port
    1983
                has priority). We have chosen for pAVR the priority as mentioned above.
    1984
          
    1985
       
  • \b IO \b File \n
  • 1986
          The associated tests are defined in `test_pavr_io_file.vhd'. \n
    1987
          The following tests are performed on the IOF:
    1988
          
    1989
             
  • test the IOF general write/read/bit processing port. \n
  • 1990
                Test all opcodes that this port is capable of:
    1991
                
    1992
                   
  • wrbyte
  • 1993
                   
  • rdbyte
  • 1994
                   
  • clrbit
  • 1995
                   
  • setbit
  • 1996
                   
  • stbit
  • 1997
                   
  • ldbit
  • 1998
                
    1999
             
  • test the IOF port A. \n
  • 2000
                Port A is intended to offer to pAVR pin-level IO connectivity with the
    2001
                outside world. \n
    2002
                Test reading from and writing to Port A. \n
    2003
                Test that Port A pins correctly take the appropriate logic values
    2004
                (high, low, high Z or weak high).
    2005
             
  • test Timer 0.
  • 2006
                
    2007
                   
  • test Timer 0 prescaler.
  • 2008
                   
  • test Timer 0 overflow.
  • 2009
                   
  • test Timer 0 interrupt.
  • 2010
                
    2011
             
  • test External Interrupt 0. \n
  • 2012
                Test if each possible configuration (activation on low level, rising
    2013
                edge or falling edge) correctly triggers External Interrupt 0 flag.
    2014
          
    2015
       
  • \b Data \b Memory \n
  • 2016
          The tests defined in `test_pavr_dm.vhd' are simple read-write confirmations
    2017
          that the Data Memory does its job.
    2018
    2019
     
    2020
    \par Testing the pAVR entity
    2021
    pAVR as a whole was tested by building an upper entity that embedds a pAVR, its
    2022
    Program Memory and some multiplexers. Those multiplexers are meant to give Program
    2023
    Memory control to the test entity (for properly setting up Program Memory
    2024
    contents) or to pAVR (while pAVR is actually being monitored as it executes
    2025
    intructions from the Program Memory). \n
    2026
    \n
    2027
    The binary file that will be executed by pAVR during the test is automatically
    2028
    loaded into the Program Memory using an ANSI C utility, TagScan. The test entity
    2029
    has a number of tags spread over the source code, as comments. The TagScan utility
    2030
    reads the binary file to be loaded, scans the test file, and inserts VHDL
    2031
    statements into the properly tagged places. These statements load the Program
    2032
    Memory using its own write port. This way of initializing the Program Memory
    2033
    seems more general (and surely more interesting) than using file IO VHDL
    2034
    functions. \n
    2035
    The TagScan utility is also used for other purposes. For example, for
    2036
    inserting a certain header in all source files. It is heavily used as
    2037
    a general preprocessor. \n
    2038
    \n
    2039
    Testing pAVR as a whole actually means designing and running binaries that put
    2040
    pAVR on extreme situations. \n
    2041
    The following tests are done:
    2042
    \htmlonly
    2043
    2044
       
  • Interrupts
  • 2045
          This exercises pAVR interrupt handling. 
    2046
          All interrupts are tested. 
    2047
          The associated peripherals (Port A, Timer 0 and External Interrupt 0) are
    2048
          put in a variety of conditions. 
    2049
          Results: 
    2050
          tbd
    2051
       
  • General test
  • 2052
          This is a hand-written assembler source that is meant to be assembled and
    2053
             run on pAVR. 
    2054
          It exercises each of pAVR instructions, one by one. 
    2055
          It tries to put pAVR in most difficult situations, for each instruction. For
    2056
          example, it exercises:
    2057
          
    2058
             
  • concurrent stalls
  • 2059
             
  • stalls combined with 32 bit instructions
  • 2060
             
  • stalls combined with intructions that change the instruction flow
  • 2061
             
  • control hazard candidates (stress the Program Memory Manager and
  • 2062
                the Stall and Flush Unit)
    2063
             
  • data hazard candidates (stress the Bypass Unit)
  • 2064
          
    2065
          Results: 
    2066
             Passed OK. The verification consisted of checking each instruction, each
    2067
             intermediate result and each relevant intermediate internal state.
    2068
             
    2069
                
    2070
                   
    Assembler
    2071
                   
    Clocks
    2072
                   
    Instructions
    2073
                   
    CPI
    2074
                
    2075
                   
    avrasm32, by Atmel
    2076
                   
    667
    2077
                   
    361
    2078
                   
    1.85
    2079
             
    2080
       
  • Sieve
  • 2081
          Sieve of Eratosthenes; finds the the first 100 prime numbers. 
    2082
          Written in ANSI C. 
    2083
          Results:
    2084
          
    2085
             
    2086
                
    Compiler
    2087
                
    Clocks
    2088
                
    Instructions
    2089
                
    CPI
    2090
             
    2091
                
    avr-gcc, O0
    2092
                
    12170
    2093
                
    8851
    2094
                
    1.37
    2095
             
    2096
                
    avr-gcc, O3
    2097
                
    11946
    2098
                
    8824
    2099
                
    1.35
    2100
          
    2101
       
  • TagScan
  • 2102
          Exercises string manipulating routines. 
    2103
          Written in ANSI C. 
    2104
          Results:
    2105
          
    2106
             
    2107
                
    Compiler
    2108
                
    Clocks
    2109
                
    Instructions
    2110
                
    CPI
    2111
             
    2112
                
    tbd
    2113
                
    tbd
    2114
                
    tbd
    2115
                
    tbd
    2116
          
    2117
       
  • C compiler
  • 2118
          Written in ANSI C. 
    2119
          Results:
    2120
          
    2121
             
    2122
                
    Compiler
    2123
                
    Clocks
    2124
                
    Instructions
    2125
                
    CPI
    2126
             
    2127
                
    tbd
    2128
                
    tbd
    2129
                
    tbd
    2130
                
    tbd
    2131
          
    2132
       
  • Waves
  • 2133
          Simulates waves on the surface of a liquid. 
    2134
          Written in ANSI C. 
    2135
          Uses floating point numbers (observation: the avr-gcc compiler seems to
    2136
          take about 200 pAVR clocks per floating point operation). 
    2137
          A mesh of only 5x5 points is considered, and only 5 iterations
    2138
          are done. Bigger values make the simulation unacceptably long on
    2139
          the available computer. 
    2140
     
    2141
          Checking the result is done by converting the array of 25 floats
    2142
          into a scaled array of 25 chars, copying these chars from Data
    2143
          Memory (by hand), constructing a 3D image of the result, and
    2144
          comparing it to a reference 3D image. 
    2145
     
    2146
          Results: 
    2147
          Passed OK. As expected, the chars array to be tested exactly matches
    2148
          the reference array.
    2149
          
    2150
             
    2151
                
    Compiler
    2152
                
    Clocks
    2153
                
    Instructions
    2154
                
    CPI
    2155
             
    2156
                
    avr-gcc
    2157
                
    209,175
    2158
                
    122,236
    2159
                
    1.71
    2160
          
    2161
    2162
    \endhtmlonly
    2163
    \n
    2164
    */
    2165
     
    2166
     
    2167
     
    2168
    /*!
    2169
    \defgroup pavr_test_bugs Bugs
    2170
    \ingroup pavr_test
    2171
    \par Errata to Atmel's AVR documentation:
    2172
    The corrected versions of some paragraphs from Atmel's documentation are shown
    2173
    below. \n
    2174
    Original, wrong, terms are strikelined, while corrected terms are bolded: \n
    2175
    - The following text can be found throughout the references. \n
    2176
       \n
    2177
       \htmlonly
    2178
       
    2179
       "... 
    2180
        RAMPD  
    2181
       
    2182
       Register concatenated with the  Z register   instruction word
    2183
        enabling direct addressing of the whole data space on MCUs with more than 64K
    2184
       bytes data space. 
    2185
       
    2186
        EIND  
    2187
       
    2188
       Register concatenated with the  instruction word   Z register 
    2189
       enabling indirect jump and call to the whole program space on MCUs with more
    2190
       than 64K  bytes   words  program space. 
    2191
       ..." 
    2192
       
    2193
       
    2194
       \endhtmlonly
    2195
    - In the `AVR Instruction Set' document, page 60:
    2196
       \n
    2197
       \htmlonly
    2198
       
    2199
       "... 
    2200
        V:  Rd7 *  /Rd7   /Rr7  * /R7 + /Rd7 * Rr7 * R7 
    2201
       ..."
    2202
       
    2203
       
    2204
       \endhtmlonly
    2205
     
    2206

    2207
     
    2208
    \par Atmel's AVRStudio simulator bugs
    2209
    - bug001
    2210
       - \b symptom: NEG instruction computes the H flag via other formula than that
    2211
          given in the AVR Instruction Set (H=R3+Rd3). \n
    2212
          Where is the bug, in the simulator or in the document, it's up to be seen. \n
    2213
          Versions 3.53 and 4.04 of AVRStudio behave the same (weird) way. \n
    2214
          Example: initially having SREG=0x01 and R10=0xD9, NEG R10 sets SREG to 0x01
    2215
          instead of 0x21. \n
    2216
          The AVRStudio formula for H seems to be R3*(not Rd3) rather than R3+Rd3.
    2217
    - bug002
    2218
       - \b symptom: when trying to set/reset port A pins, there is a 1 clock delay
    2219
          between the moment PORTA receives the bits and the moment PINA gets updated.
    2220
          Those events should have been simultaneous (of course, port A direction was
    2221
          considered already configured as output, by setting DDRA(i)=1).
    2222
     
    2223

    2224
     
    2225
    \b pAVR \b bugs \b history
    2226
    \par 28-31 July 2002
    2227
       - The Program Memory and Program Counter are handled in different places, even
    2228
          though they share much functionality. Moreover, the Program Counter doesn't
    2229
          have associated an explicit manager. This makes PM and PC quite difficult to
    2230
          maintain. \n
    2231
          Reorganized PM and PC handling. Now they are handled by a common manager,
    2232
          the PM manager. \n
    2233
       - Every test runs smoothly so far.
    2234
     
    2235
    \par 27 July 2002
    2236
       - The Stall and Flush Unit and Shadow Manager are difficult to maintain because
    2237
          of too many rules and exceptions. \n
    2238
          Reorganized the SFU so that its behavior follows only one rule, the so-called
    2239
          `SFU rule': older hardware resource  requests have priority over younger ones. \n
    2240
          Reorganized the Shadow Manager so that its behavior accurately implements
    2241
          the shadow protocol. However, a few exceptions still exist (such as LPM
    2242
          Program Memory handling or CPSE RF handling).
    2243
       - *** Modelsim 5.3 behaves strange again. \n
    2244
          It asserts hardware managers warnings, but when the the local conditions are
    2245
          investigated, the situation is perfectly legal. It seems that at a moment
    2246
          when a signal has a 0-1 transition and another one has a 1-0 transition,
    2247
          there is a `small' (theoretically 0) amount of time that both signals are
    2248
          considered 1, and that transient triggers the warning. That shouldn't happen,
    2249
          it seems to be a Modelsim bug. \n
    2250
          However, trying to reproduce that behavior was unsuccessfull. It only appears
    2251
          sometimes; the apparition rule is well hidden. \n
    2252
          For now, it's best to ignore these warnings during simulation. However, it
    2253
          means that those assertions don't fullfill their purpose.
    2254
     
    2255
    \par 25 July 2002
    2256
       - bug023
    2257
          - \b symptom: IJMP and EIJMP don't jump were they are supposed to, if the
    2258
             instruction before them modifies the Z pointer.
    2259
          - \b remedy: IJMP and EIJMP actually jump before even the BPU gets updated
    2260
             by the previous instruction. As they use the Register File mapped Z
    2261
             pointer for finding target address, they need to be calmed down for a
    2262
             clock (Z pointer is modified in stage s5). \n
    2263
             Just request a nop in pipe stage s4. Now IJMP and EIJMP take 4 clocks
    2264
             (RJMP and JMP still take 3).
    2265
          - \b status: corrected
    2266
       - bug024
    2267
          - \b symptom: loads don't work any more (!). They (sometimes) get garbage.
    2268
          - \b remedy: when correcting bug 021, the shadow protocol was applied for
    2269
             all devices that could use it. It was wrong. The Data Address Calculation
    2270
             Unit must not use the shadow protocol, because it gets RF/IOF/DM
    2271
             exclusivity by means of stalling, and it must be granted access to these
    2272
             resources, even during stalls. \n
    2273
             When trying to read from Unified Memory, loads got data from shadow
    2274
             registers, not directly from the RF/IOF/DM 's data out.
    2275
          - \b status: corrected DACU
    2276
       - bug025
    2277
          - \b symptom: JMP gets corrupted if the previous instruction is a load.
    2278
          - \b remedy: JMP is a 32 bit instruction. The second word (a 16 bit constant)
    2279
             can get flushed by a previous instruction stall s5. \n
    2280
             Flush s2 requested in s3 and s4 are more delicate than other flushes.
    2281
             They can interfere with stalls requested by older instructions. They must
    2282
             be stallable because older instructions might want that. If stall s2
    2283
             requested in s3 or s4, then if older instructions require stall, don't
    2284
             blindly flush s2, but rather do nothing and wait for the stall to end.
    2285
             Only after that acknowledge the flush. \n
    2286
          - \b status: corrected
    2287
       - bug026
    2288
          - \b symptom: CPSE doesn't skip the following instruction, when it should.
    2289
          - \b remedy: the skip condition was picked as `not zero flag', instead of
    2290
             `zero flag'.
    2291
          - \b status: corrected
    2292
       - bug027
    2293
          - \b symptom: SBIC and SBIS don't do their job.
    2294
          - \b remedy: IOF read access was simply not requestd.
    2295
          - \b status: corrected the Instruction Decoder by placing an IOF request
    2296
             in pipe stage s5, for SBIC and SBIS
    2297
       - bug028
    2298
          - \b symptom: RCALL doesn't work.
    2299
          - \b remedy:
    2300
             - the 12 bit relative offset wasn't initialized in the Instruction
    2301
             Decoder. Just do that (cut&paste the corresponding code line from RJMP,
    2302
             as the relative jump address is placed in the same bits in the
    2303
             instruction code).
    2304
             - the return address was correct for CALL but bigger with one than needed
    2305
                for RCALL. Actually, CALL and RCALL need \b different return addresses,
    2306
                as CALL has 32 bits and RCALL only 16. \n
    2307
                Modification: now, the current instruction's PC is \b conditionally
    2308
                incremented in pipe stage s4. A new set of wires and registers were
    2309
                introduced so that CALL can request to increment its return address.
    2310
                RCALL doesn't need to do that.
    2311
          - \b status: corrected the Instruction Decoder, so that CALL requires to
    2312
             increment its return address.
    2313
       - note: all instructions seem to work.
    2314
     
    2315
    \par 24 July 2002
    2316
       - bug020
    2317
          - \b symptom: garbage got by loads placed immediately after stores that
    2318
             modify their pointer.
    2319
          - \b remedy: loads and stores can modify their data pointer. However, the
    2320
             Bypass Unit must also be updated, because the pointer registers are
    2321
             placed in the Register File. The BPU wasn't updated.
    2322
          - \b status: corrected
    2323
          - \b note: the modularity of the design (separate hardware managers, small
    2324
             set of conventions regarding signal naming, grouping similar-function
    2325
             code) payed off. This bug required an intervention spread out over half
    2326
             megabyte of code. The Data Address Calculation Unit, Bypass Unit were
    2327
             modifed, new wires and registers were defined, some of them were renamed.
    2328
       - bug021
    2329
          - \b symptom: stores that modify their pointer make the following
    2330
             instruction unable to update the Bypass Unit. Moreover, the BPU is
    2331
             written with garbage.
    2332
          - \b remedy: Stores and the instruction after them can require to
    2333
             simultaneousely write the BPU. That's because these stores make intensive
    2334
                use of BPU and eat all its write resources. They write 3 bytes: 2 of
    2335
                them in s5 (the modified pointer) and 1 in s6 (the data to be written
    2336
                into the Register File). The one written in s6 can be simultaneous with
    2337
                following instruction's s5 write BPU request. \n
    2338
                To correct this bug, there are 2 options:
    2339
                
    2340
                
  • 1. add a stall in pipe stage s5 for all stores. That is, stores
  • 2341
                   will take 3 clocks.
    2342
                
  • 2. increase BPU width from 2 chains to 3 chains and modify the way
  • 2343
                   stores make use of Bypass Unit (write all what has to be written -
    2344
                   3 bytes - in the same pipe stage, s5). This is more attractive
    2345
                   because stores still need only 2 clocks. However, the Bypass Unit
    2346
                   continues to grow (from initial depth/width of 2/2 to the present 4/3).
    2347
                
    2348
             Option 2 was chosen. \n
    2349
             The Unified Memory architecture favorized this bug. Stores
    2350
             must be able to write the Register File and, consequently, write their
    2351
             data into BPU along with the pointer they have modified.
    2352
          - \b status: corrected
    2353
       - bug022
    2354
          - \b symptom: LPM always returns 0.
    2355
          - \b remedy: multiple bug:
    2356
             - The LPM stalled s2, then read s2 status. Seeing it `busy', gives up from
    2357
                reading what it needed and maintains pavr_pm_addr_int at its present value.
    2358
                The Program Memory Manager needs to be instructed to forcedly grant access
    2359
                to LPM instructions to s2, even if it is stalled. Also, the shadow protocol
    2360
                must be bypassed.
    2361
             - LPM didn't update BPU.
    2362
             - pointer registers were used directly in a few hardware managers, not via BPU.
    2363
                This enables subtle read before write hazards (they escaped until now).
    2364
          - \b status: corrected
    2365
       - note: \n
    2366
                LD Rd, -X; LD Rd, X; LD Rd, X+; \n
    2367
                LD Rd, -Y; LD Rd, Y; LD Rd, Y+; LDD Rd, Y+q; \n
    2368
                LD Rd, -Z; LD Rd, Z; LD Rd, Z+; LDD Rd, Z+q; \n
    2369
                ST -X, Rr; ST X, Rr; ST X+, Rr; \n
    2370
                ST -Y, Rr; ST Y, Rr; ST Y+, Rr; STD Y+q, Rr; \n
    2371
                ST -Z, Rr; ST Z, Rr; ST Z+, Rr; STD Z+q, Rr; \n
    2372
                LPM; LPM Rd, Z; LPM Rd, Z+ \n
    2373
                seem to work.
    2374
     
    2375
    \par 23 July 2002
    2376
       - bug016
    2377
          - \b symptom: read before write data hazards
    2378
          - \b remedy: BLD instruction didn't update BPU.
    2379
          - \b status: corrected
    2380
       - bug017
    2381
          - \b symptom: BLD doesn't modify the target register.
    2382
          - \b remedy: while processing pavr_s5_iof_rq IOF request, the IOF Manager
    2383
             set IOF bit address to zero instead of pavr_s5_iof_bitaddr. Correct that.
    2384
          - \b status: corrected
    2385
       - bug018
    2386
          - \b symptom: Even though they work fine separately, POP, PUSH and MOVW one
    2387
             after another (in various combinations) don't.
    2388
          - \b remedy:
    2389
             This is a triple (!) bug:
    2390
             
    2391
                
  • MOVW requires a stall in s6 while POP requires a stall in s5. The
  • 2392
                   two stalls are simultaneous. \n
    2393
                   The Stall and Flush Unit doesn't handle properly multiple stalls. \n
    2394
                   Modify SFU so that the oldest stall doesn't kill the younger one(s),
    2395
                   but only delays it (them).
    2396
                
  • The SP was incremented during a stall, and the DACU received after
  • 2397
                   the stall a wrong pointer (the new SP). \n
    2398
                   All hardware resources must be stallable. Presently they are not.
    2399
                
  • The instruction after MOVW, PUSH is skipped. The PM data out shadow
  • 2400
                   register doesn't do its job. \n
    2401
                   The shadow registers are updated every clock. That's not right. \n
    2402
                   Update them only if they don't already hold meaningful data (check
    2403
                   the corresponding `shadow_active' flag). Otherwise, during
    2404
                   successive stalls they get corrupted.
    2405
             
    2406
             This was a tough one.
    2407
          - \b status: corrected
    2408
       - bug019
    2409
          - \b symptom: the sequence \n
    2410
             LDI R17, 0xC3 \n
    2411
             ST  Z+, R17 \n
    2412
             results in storing garbage into memory.
    2413
          - \b remedy: the nop requests (placed by ST) increase the needed BPU depth
    2414
             with one. Thus, BPU depth must be increased from 3 to 4.
    2415
          - \b status: corrected
    2416
       - note:
    2417
          - CBI, SBI, BST, BLD, MOVW, IN, OUT, PUSH, POP, LDS, STS    seem to work.
    2418
     
    2419
    \par 22 July 2002
    2420
       - bug011
    2421
          - \b symptom: DEC does in fact INC
    2422
          - \b remedy: ALU operand 2 is selected as -1 in pipe stage s5, and then, the
    2423
             DEC-related code does out=op1-op2, which results in out=op1+1. \n
    2424
             Just make the ALU treat INC and DEC the same way (that is, out=op1+op2).
    2425
          - \b status: corrected
    2426
       - bug012
    2427
          - \b symptom: BPU doesn't do its job.
    2428
          - \b remedy: stupid and time costly bug, generated by a (too) quick cut and
    2429
             paste in the BPU code.
    2430
          - \b status: corrected
    2431
          - \b note: Modelsim PE/Plus 5.3a_p1 has a cache problem. After correcting
    2432
             this bug, the same results came after recompiling and restarting the
    2433
             simultation. It was enough to close Modelsim and open the project again
    2434
             for things to go fine. It's not the first time Modelsim behaves this way.
    2435
       - bug013
    2436
          - \b symptom: Z flag is computed wrongly for ALU opcodes that need 8 bit
    2437
             substraction with carry.
    2438
          - \b remedy: Z=Z*oldZ
    2439
          - \b status: corrected
    2440
       - bug014
    2441
          - \b symptom: Z flag is computed wrongly for all ALU opcodes (!).
    2442
          - \b remedy: instead of and-ing the negated bits of output, Z output was
    2443
             computed by and-ing output's bits.
    2444
          - \b status: corrected
    2445
       - bug15
    2446
          - \b symptom: read before write data hazards related to IN instruction
    2447
          - \b remedy: IN doesn't write the Bypass Unit. Do that. Nasty one, requiring
    2448
             new wires and registers.
    2449
          - \b status: corrected
    2450
          - \b note: the shadow manager was completed. Pretty much code, hopefully
    2451
             with no new bugs.
    2452
       - notes:
    2453
          - MOV, INC, DEC, AND, AND, OR, ORI, EOR, COM, NEG, CP, CPC, CPI, SWAP, LSR,
    2454
             ROR, ASR, multiplications (timing-only), BCLR, BSET seem to work.
    2455
     
    2456
    \par 21 July 2002
    2457
       - bug008
    2458
          - \b symptom: read before write data hazards.
    2459
          - \b remedy: the Bypass depth was increased from 2 to 3. Design bug.\n
    2460
             *** To update the documentation!
    2461
          - \b status: corrected.
    2462
       - bug009
    2463
          - \b symptom: the 16 bit arithmetic instructions write only the lower byte of
    2464
             the result in the Register File if the next few instructions aren't
    2465
             nops.
    2466
          - \b remedy: 16 bit arithmetic instructions stalled s6. During stalling s6,
    2467
             the Bypass flushed a value that was needed later. A signal was needed
    2468
             that can stall the BPU. Now, the stall s6 requests also stall the BPU.\n
    2469
             Pretty triky design bug.\n
    2470
             *** To update the documentation!\n
    2471
          - \b status: corrected.
    2472
       - bug010
    2473
          - \b symptom: stalls needed by 16 bit arithmetic instructions induce the
    2474
             replacement of the instruction placed 4 clocks later by a nop
    2475
          - \b remedy: shadow registers were assigned, but never used. PM data out,
    2476
             (and consequently, the instruction register) read a nop instead the
    2477
             correct data that was read during the stall. Now the pipeline uses
    2478
             shadow registers related by PM data out.\n
    2479
             *** The other shadow registers (related to DM, RF, IOF and DACU data out)
    2480
             are still unused!\n
    2481
             *** To update the documentation with shadow-related issues!
    2482
          - \b status: corrected.\n
    2483
       - note:
    2484
          - ADD, ADC, ADIW, SUB, SUBI, SBC, SBIW seem to work.
    2485
     
    2486
    \par 15 July 2002
    2487
       - bug004
    2488
          - \b remedy: reporting this bug was a bug. The Register File works fine. This
    2489
             bug report was generated by modifying X register (RF addr 27:26) and
    2490
             expecting that RF bulk data (RF addr 0...25) to be modified, which won't
    2491
             happen.
    2492
          - \b status: ok.
    2493
       - bug005
    2494
          - \b remedy: DACU data out was duplicated, with 2 different names: pavr_dacu_do
    2495
             and pavr_s6_dacudo. pavr_dacu_do was only writen, and pavr_s6_dacudo was
    2496
             only read. When RET tried to read the return address from DACU, it got
    2497
             garbage, because it read DACU data out from pavr_s6_dacudo, that was not
    2498
             assigned any value.\n
    2499
             Cut out pavr_s6_dacudo. DACU data out is now unique, for both read an
    2500
             write (that is, pavr_dacu_do). Also, the documentation was updated.
    2501
          - \b status: corrected.
    2502
       - bug006
    2503
          - \b symptom: CALL doesn't work.
    2504
          - \b remedy: in the SP Manager, pavr_s5_calldec_spwr_rq was writen twice, and
    2505
             pavr_s52_calldec_spwr_rq wasn't writen at all, because of a less careful
    2506
             cut-and-paste. As a result, during CALL, PC's lsByte was not stored.
    2507
          - \b status: corrected
    2508
       - bug007
    2509
          - \b symptom: ALU flags are not defined.
    2510
          - \b remedy: ALU flags in was not connected to SREG (zero-level assignment)
    2511
          - \b status: corrected
    2512
       - notes:
    2513
          - RET, CALL seem to work.
    2514
          - pAVR runs its first complete program (12 instructions).
    2515
     
    2516
    \par 13 July 2002
    2517
       - bug003
    2518
          - \b symptom: RET is a mess
    2519
          - \b remedy: during nop requests, stall must have higher priority that flush in
    2520
             s2. The Stall Manager (the nop request-related lines) must take care of
    2521
             that.
    2522
          - \b status: corrected
    2523
       - bug004
    2524
          - \b symptom: RF seems to be unable to write other registers than pointer
    2525
             registers.
    2526
          - \b status: NOT corrected!
    2527
       - bug005
    2528
          - \b symptom: RET is still a mess.
    2529
          - \b status: NOT corrected!
    2530
       - bugs pool: 004, 005
    2531
     
    2532
    \par 27 June 2002
    2533
       - bug001
    2534
          - \b symptom: read before write data hazards. Hmm, this kind of bugs shouldn't
    2535
             have occured.
    2536
          - \b remedy: LDI didn't update BPU0. Just do that.
    2537
          - \b status: corrected.
    2538
       - bug002
    2539
          - \b symptom: while reading the code, something was smelling bad.
    2540
          - \b remedy: the code that computes the branch/skip conditions was not writen
    2541
             at all.
    2542
          - \b status: corrected.
    2543
       - notes:
    2544
          - The controller has successfully executed its first instruction (a RJMP)!
    2545
             However, it was the only...\n
    2546
          - The kernel seems to be easy to debug thanks to its regular structure.
    2547
          - RJMP, LDI, NOP seem to work.
    2548
     
    2549
    \n
    2550
    */
    2551
     
    2552
     
    2553
     
    2554
    /*!
    2555
    \defgroup pavr_fpga FPGA prototyping
    2556
    \ingroup pavr_test
    2557
    No FPGAs were burned so far. \n
    2558
    \n
    2559
    \n
    2560
    \n
    2561
    */
    2562
     
    2563
     
    2564
    /*!
    2565
    \defgroup pavr_src Sources
    2566
    \par Sources
    2567
    The source package contains the following files, in the compiling order:
    2568
    -  std_util.vhd 
    2569
       - Type conversion routines ofted used throughout the other source files in this
    2570
          project
    2571
       - Basic arithmetic functions
    2572
       - Sign and zero-extend functions
    2573
       - Vector comparision function
    2574
    -  pavr_util.vhd 
    2575
       - Bypass Unit access function
    2576
       - Interrupt arbitrer function
    2577
    -  pavr_constants.vhd 
    2578
       - Constants needed by pAVR
    2579
       - When costumizing pAVR, look and modify (seek-and-destroy) here.
    2580
    -  pavr_data_mem.vhd 
    2581
    -  pavr_alu.vhd 
    2582
    -  pavr_register_file.vhd 
    2583
    -  pavr_io_file.vhd 
    2584
    -  pavr_control.vhd 
    2585
       - pAVR pipeline (pAVR kernel)
    2586
     
    2587
    \par Test sources
    2588
    The test sources in this package implement all the tests presented
    2589
    \ref pavr_test "above". \n
    2590
    The test source package contains the following files:
    2591
    -  test_pavr_alu.vhd  \n
    2592
       Tests the ALU.
    2593
    - 
    2594
       test_pavr_control_interrupts.vhd  \n
    2595
       This test is yet to be done.
    2596
    -            test_std_util.vhd              \n
    2597
       Tests the utilities defined in `std_util.vhd'.
    2598
    -       test_pavr_data_mem.vhd         \n
    2599
       Tests the Data Memory.
    2600
    -  test_pavr_register_file.vhd    \n
    2601
       Tests the Register File.
    2602
    -        test_pavr_io_file.vhd          \n
    2603
       Tests the IO File.
    2604
    -      test_pavr_constants.vhd        \n
    2605
       Defines constants needed by the main test entity.
    2606
    -           test_pavr_util.vhd             \n
    2607
       Defines utilities needed by the main test entity.
    2608
    -             test_pavr_pm.vhd               \n
    2609
       Defines the Program Memory that is needed by the main test entity.
    2610
    -                test_pavr.vhd                  \n
    2611
       Defines the main test entity. \n
    2612
       Tests pAVR as a whole.
    2613
     
    2614
    \anchor pavr_src_conv
    2615
    \par Conventions used when writting the VHDL sources
    2616
    The terminology used reflects the data flow. \n
    2617
    For example, `pavr_s4_s6_rfwr_addr1' is assigned in s3 (by the instruction decoder),
    2618
    shifts into `pavr_s5_s6_rfwr_addr1', that finally shifts into
    2619
    `pavr_s6_rfwr_addr1' (terminal register). Only this one carries information
    2620
    actually used by hardware resource managers. This particualr one signalizes
    2621
    an access request to the Register File write port manager. \n
    2622
    \n
    2623
    Process splitting strategy:
    2624
    2625
       
  • requests to hardware resources are managed by dedicated processes, one
  • 2626
          VHDL process per hardware resource.
    2627
       
  • a main asynchronous process (instruction decoder) computes values that
  • 2628
          initialize the pipeline in s3.
    2629
       
  • a main synchronous process assings new values to pipeline registers.
  • 2630
    2631
    \todo
    2632
    Replace `next_...' signals family with a (pretty wide) state decoder.
    2633
     
    2634
    \par Licensing
    2635
    Please read the \ref pavr_about "licensing terms".
    2636
    \n
    2637
    */
    2638
     
    2639
     
    2640
     
    2641
     
    2642
    /*!
    2643
    \defgroup pavr_ref References
    2644
    \par References
    2645
    Most of the documentation needed for this project was found on Atmel's website,
    2646
    http://www.atmel.com. While working on this project (2002 Q1, Q2), it was
    2647
    available in PDF format, free for downloading. \n
    2648
    \n
    2649
    The specific documents that were used are:
    2650
    2651
       
  • "AVR Instruction Set", Atmel Corporation
  • 2652
       
  • Datasheets for the controllers:
  • 2653
       
    2654
          
  • ATtiny28 series
  • 2655
          
  • AT90S2313
  • 2656
          
  • AT90S8535
  • 2657
          
  • ATmega8 series
  • 2658
          
  • ATmega103 series
  • 2659
       
    2660
    2661
    \n
    2662
    While designing pAVR's pipeline, I found many interesting ideas in the book
    2663
       "Computer architecture - a quantitative approach", by J. Hennessy and D.
    2664
       Patterson. If you are a processor designer, then this book is for you. \n
    2665
     
    2666
    \par Errata
    2667
    A few \ref pavr_test_bugs "bugs" have been found in Atmel's documents.
    2668
    \n
    2669
    \n
    2670
    \n
    2671
    */
    2672
     
    2673
     
    2674
     
    2675
    /*!
    2676
    \defgroup pavr_thoughts Some final thoughts
    2677
    \par Instead of conclusion...
    2678
    It's relatively easy to design a fast 8 bit controller. All that has to be done
    2679
       is to follow the path well known from the big brothers, the 32 bit
    2680
       controllers. The short story is: analyze what "typical programs" mean,
    2681
       imagine a simple and fast instruction set, and implement it into a deep
    2682
       pipeline (by the way, for this topics, I recommend you "Computer
    2683
       architecture - a quantitative approach", by J. Hennessy and D. Patterson). \n
    2684
    \n
    2685
    Then, why are the 8 bit controllers currently on the market so slow? The
    2686
       instruction set, CPI, max frequency for current 8 bit ucs are bad. In fact,
    2687
       they are so bad, that we must consider other factors than pure uc design to
    2688
       explain that. My guess is that market issues distructively interfere here. How
    2689
       is that, this could be another project's goal... \n
    2690
    \n
    2691
    \n
    2692
    */
    2693
     
    2694
     
    2695
     
    2696
    /*!
    2697
    \defgroup pavr_about About ...
    2698
    \par Project
    2699
    \b pAVR (pipelined AVR) is an 8 bit RISC controller, compatible with Atmel's
    2700
    AVR core, but about 3x faster in terms of both clock frequency and MIPS. \n
    2701
    The increase in speed comes from a relatively deep pipeline.
    2702
    \par Version
    2703
    0.32
    2704
    \par Date
    2705
    2002 August 07
    2706
    \par Author
    2707
    Doru Cuturela, doruu@yahoo.com \n
    2708
    \par Licensing
    2709
    This program is free software; you can redistribute it and/or modify
    2710
    it under the terms of the GNU General Public License as published by
    2711
    the Free Software Foundation; either version 2 of the License, or
    2712
    (at your option) any later version. \n
    2713
    This program is distributed in the hope that it will be useful,
    2714
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    2715
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    2716
     GNU General Public License  for more details. \n
    2717
    You should have received a copy of the GNU General Public License
    2718
    along with this program; if not, write to the Free Software
    2719
    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
    2720
     
    2721
    \par Note
    2722
    The design effort for this project was about 6 months (2002, Feb-Aug), one
    2723
    man working. \n
    2724
    \n
    2725
    */

    powered by: WebSVN 2.1.0

    © copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.