Newsletter June 2012
About Verification of Microprocessors, part 1Verifying a Microprocessor
SoCs of today are very large systems. A modern multicore SoC typically consists of processor clusters (CPUs, DSPs, ASIPs, dedicated hardware accelerators), multiple-level data memory and program memory systems (L1, L2, L3 caches and memories), on-chip interconnections (buses, crossbars, bridges), DMA controllers, IOs, and various system peripherals.
The design of such a SoC is challenging. Typically, it is not the actual implementation, but instead the verification, that requires most of the project time. Depending on the complexity of the SoC and what bits and pieces there are to start with, verification tends to take 50 % - 90 % of the project time. So, verification is on the critical path in the project time plan. As a consequence of this, several different methods for verification have been developed over time, such as simulation, hardware emulation, formal verification, and assertion-based verification.
The microprocessor core is a central SoC-component. Processor cores come in different flavours. From a system development point of view there are three broad formats for processor cores to choose from.
The first format is the already built processor core. This core is specified, designed, and verified by some microprocessor vendor. The core is typically available as a hard macro and delivered together with the tools necessary for SoC integration and software development. For the customer this approach means the least degree of freedom, but possibly also the least amount of work. The customer cannot choose any of the processor features. On the other hand, the customer does not have to develop any parts of the core or the software tools.
The second format is the so called re-configurable processor core. This core is built around an already specified base core, which is configurable (to some extent) and extendable. The extendibility means that the customer can augment the base core with application specific instructions supported by application specific hardware (additional data pipelines, ports, registers etc.). The design environment typically includes the automatic generation of tools for software development. For the customer this approach means a fairly high degree of freedom, but also some amount of work. The customer can choose the features of the processor extensions. On the other hand, the customer has to develop the extended parts of the core.
The third format is the customer proprietary processor core. This core is specified, designed, and verified by the customer him/herself. For the customer this approach means the highest degree of freedom, but also the largest amount of work. The customer can choose all of the processor features. On the other hand, the customer has to develop all parts of the core and the software tools.
Since the re-configurable core and the customer proprietary core includes development job done by the customer, verification of the core is needed.
A microprocessor may be verified by executing (lots of) test programs in a simulation test bench. For this verification the concept of constrained random verification is of particular interest. This verification methodology suggests that two models of a microprocessor (the RTL-model being developed and a reference C-model) execute the same large set of random test programs, and then the actual resulting behaviour (of the RTL-model) is compared to the expected behaviour (of the C-model). The purpose is to verify each individual assembly instruction of the instruction set, with the help of automatically created random test programs.
Directed verification is done by directed test cases.
A directed test case is an assembly program with the purpose of testing one certain instruction, or a small group of related instructions. A self-checking test program consists of four sections; preparation of stimuli input data, preparation of expected result, test of the instruction (generating actual result), and comparison of actual and expected result.
The advantage of directed testing is that it is a fairly quick and easy method for the very first verification of an instruction. It gives early simulation feedback, so that the most embarrassing bugs can be fixed. This type of verification is sometimes called ‘happy testing’ or ‘smoke testing’.
The disadvantage of directed testing is that it is a very slow and tedious method for exhaustive verification. For a simple instruction such as add (addition), the number of combinations of input arguments (registers and immediate) can be large. The result can hit several various conditions (overflow, negative/positive, zero, carry). Also, the execution of the instruction may depend on additional control information stored in configuration registers (rounding, shifting, saturation etc.). Directed testing means (very) small test volumes.
Random verification is done by random test cases.
A random test case is an assembly program with the purpose of testing any instruction, or any sequence of instructions. The instructions are (within certain constraints) randomly selected by a random program generator. The random generator does obviously not produce ‘meaningful’ programs, but that is not the point.
The first advantage of random testing is that the random generator can in a very short time create a large number of test programs, each program having an arbitrary (however reasonable) size. This means enormous test volumes.
The second advantage of random testing is that the random generator will, over time, produce odd and strange and weird and unusual instruction sequences, that few human programmers will ever think of. Very often these tests are the ones that will hit corner cases and extreme cases, and they can reveal nasty bugs that otherwise would have been found very late, or not at all.
A disadvantage of random testing is that without any kind of measurement we cannot be sure of what has been tested and what has not (yet) been tested. Therefore a random test environment also needs to measure test coverage.
End of part 1......
Krister Karlsson, ORSoC
Energy Micro offers free samplesThe Norwegian microcontroller manufacturer Energy Micro is taking the unusual step of giving away free samples of their low power Arm microcontrollers in the Gecko family. The offer is valid worldwide and in return you have to provide details of your project and allow Energy Micro to contact you.
The circuits that can be ordered are the Tiny Gecko, Gecko, Leopard Gecko and Giant Gecko. You get two samples and you can only order twice in a 30-day period. All models that are subject to this offer are marked with a small icon with the text "SAMPLE" on the company's website . By clicking on the icon two copies of the circuit is placed in the shopping cart and you can then proceed to the checkout and complete the information about yourself.
Customers outside the U.S. will be contacted by the distributor Mouser and may have to answer some additional questions to ensure that all export requirements are met.
Normally, you have the samples within three working days and it costs you nothing. Energy Micro even pays for the shipping.
The Gecko family is based on the Arm Cortex-M architecture and consists of more than 240 variants. The circuits consume as little as 150 µA/MHz in active mode while consumption drops to as little as 1 µA in standby.
Published by Elektroniktidningen at link
Update from OC-Team
This topic gives you an update of what has been "cooking" at the OpenCores community during the last month.
This month activities:
- Running very smoothly
- Running very smoothly.
Our message to the community:
- Help us improve the community, please provide feedback
- And have a nice summer vacation :-)
Marcus Erlandsson, ORSoC
Here you will see interesting new projects that have reached the first stage of development.
CIC Decimation Filter
This is a structural model for cascaded integrator comb (CIC) decimation filters. The filter consists of integrator, downsampler and comb stages. Each block is developed in behavioral manner, however, the top-level is developed in structural hierarchal manner. A test-bench is included for each single block and for the top-level entity as well.
Development status: Stable
Jun 8, 2012: Uploading
AltOr32 - Alternative Lightweight OpenRisc CPU
Alternative OpenRisc32 is a cut-down, lightweight implementation of the OpenRisc 1000 (ORBIS32) instruction set.
The project aims to provide a simple easy to follow implementation of an OpenRisc CPU which will fit in the lower-end FPGAs.
- Instruction set simulator created & functional
- Initial non-pipelined Verilog version added.
- Basic implementation with UART & Timer fits in XC3S250E (around 59% of slices used).
- A simple simulator for OpenRisc instructions, where only the essentials have been implemented.
- Compiles under Linux (make) or Windows (VS2003+).
- Able to execute OpenRisc 1000 (ORBIS32) code compiled with the following options:
-msoft-div -msoft-float -msoft-mul -mno-ror -mno-cmov -mno-sext
The project contains a Verilator cycle accurate model of the CPU which can execute the same code as the simulator. Waveforms can be outputted and viewed in GTKWave.
Development status: Planning
Jun 16, 2012: Added GadgetFactory Papilio One (XC3S250E) example project
Jun 15, 2012: Project description updated
Jun 10, 2012: Added simple simulator for ALTOR32
Jun 10, 2012: Initial description.
Natalius 8 bit RISC
Natalius is a compact, capable and fully embedded 8 bit RISC processor core described 100% in Verilog. It occupies about 268 Slices, 124 FFs, 503 LUTs (4 input) in Xilinx Spartan3E1600 (around 1.67% slices). Natalius offers an assembler that can run on any python console.
The instruction memory is implemented in two Xilinx BlockRAM Memories, it stores 2048 instructions, each instruction has a width of 16 bits (2048x16). Each instruction takes 3 clock cycles to be executed.
1. 8 Bit ALU
2. 8x8 Register File
3. 2048x16 Instruction Memory
4. 32x8 Ram Memory
5. 16x11 Stack Memory
6. Three CLK/Instruction
7. Carry and Zero flags
8. No operation Instruction (nop)
9. 8 bit Address Port (until 256 Peripherals)
10. LDI, LDM, STM (Memory Access Instructions)
11. CMP, ADD, ADI, SUB (Arithmetic Instructions)
12. AND, OOR, XOR, NOP, SL0, SL1, SR0, SR1, RRL, RRR (Logical Instructions)
13. JMP, JPZ, JNZ, JPC, JNC, CSR, RET, CSZ, CNZ, CSC, CNC (Flow Control Instructions)
Development status: Beta
Jun 4, 2012: features
Jun 3, 2012: ver 1
The core supplies post-processing for a video signal. It reduces the color width while dithering the image to keep the impression of more colors than really exist. This reduces banding effects and enhances the quality for the viewer.
The used method is "Sierra Lite".
The core is configurable (at compile/synthesis time) in: - resolution
- input color width
- output color width
It uses very few ressources.
Common Full HD Dithering (1920*1080 @ 60hz @ 6 bit from 8 bit source) used with many LCD Displays possible on cyclone 2:
- 120 LE
- 8kbit Memory (2 M4K Blocks)
- timing met (~125 Mhz required, ~140 Mhz possible)
Tested in simulation:
bmp read -> processing -> written back to bmp
Tested on hardware using Altera/Terasic DE1:
- Cyclone 2
- 640 * 480 @ 60hz
- Reduction from 8 bit per color to 3 bit per color
Development status: Stable
Jun 21, 2012: fixed spelling mistake Jun 11, 2012: instructions and code added Jun 11, 2012: initial creation
Johan Rilegård, ORSoC