Newsletter November 2010
OpenCores - Now More Than 100 000 registered usersThis month we celebrate that OpenCores passing incredible 100 000 registered users!
OpenCores continues to grow with impressive speed and has now passed 100 000 registered users. Over the past three years the number of users has grown from 20 000 to today's 100 000 registered users, and the growth continues with approximately 3000 new users each month.
OpenCores is the world's largest site/community for the development of hardware IP blocks as open source. OpenCores is owned and operated by ORSoC - a Swedish design house specializing in FPGA / ASIC development with a focus on SoC designs based on technology from OpenCores.
Use of blocks from OpenCores in SoC designs has become very popular in recent years. ORSoC has developed several systems in FPGAs and ASICs, which are mostly based on IP blocks from OpenCores. The latest product is intended for space and will be launched in satellites in 2011.
The main drivers for product developers to use open source IP’s for theirs FPGA/ASIC designs is to ensure total control over their own designs. This means unlimited freedom to change target technology and it creates great possibilities to have clear boundaries between hardware and software, which allows the software not to be affected by hardware changes. Another important factor of the success of OpenCores is that there are excellent tools chains, debugging and verification capabilities available for the technology.
We see the sharp increase in the number of users as a clear proof of the popularity and usability of the technology. In 2011, we intend to bring additional services and improvements to OpenCores, which will lead to even higher growth.
Johan Rilegård, ORSoC
Open source at Electronica
The team from ORSoC and OpenCores visited Electronica in Munich.
As usual it was a great trade fair, crowded with people from the electronic industry, who is there to find out more about new products, the latest technologies, updated tools, etc. They have all one thing in common… they generate loads of new business.
Electronica is a great opportunity for ORSoC to meet with companies interested in open source HW technology, and the OpenRISC 1200 platform.
We for sure knew there is a great interest for this technology, but we could never believe the interest was this BIG.
During our three days at Electronica we had about 40 real good meetings seriously discussions of future. A big part of these will start within the next 6 month, any it looks like quite some of them will use the OpenRISC platform as a foundation for their product/platform.
Except from that we also had about the same amount of meetings/discussions with people companies that showed a great interest of the OpenCores technology and likely will use it there future electronic designs. Even if these companies have not come as far in their plans as the first 40.
It is of course extremely fun to see that there is such a great interest for these unique and well designed IPs. Our work to promote this technology will continue with full speed and we will increase our design team to be able to handle all requests of design services related to the OpenRISC 1200 platform.
The near future will be very interesting… and we are already looking forward to the next Electronica.
Johan Rilegård, ORSoC
TCE project: Co-design of application-specific processors with compiler support
TTA-based Codesign Environment (TCE) is an application-specific instruction-set processor (ASIP) design toolset developed in Tampere University of Technology in several research projects since 2003.
The use case for application-specific processors
Especially in embedded devices so called general purpose "off-the-shelf processors" are often not optimal for the application at hand. The readily available processors might be too large in chip area, might consume too much power, might be too expensive (for a mass product) or are not running the set of programs fast enough. In order to tackle the performance problem, a common design choice is to run parts of the application in software while speeding up the performance critical functions with a custom hand-tailored hardware accelerator or a co-processor implemented as an application-specific integrated circuit (ASIC).
The design, implementation and verification time of the hardware accelerators costs money and lengthens the time-to-market for the designed device. In addition, a fixed function hardware accelerator designed with an hardware description language (HDL) such as VHDL or Verilog has the problem of being "carved in stone", thus not providing programmability that enables late bug fixes and on-the-field updates to the supported set of functions. Field-Programmable Gate Arrays (FPGAs) allow reconfiguring the implemented hardware logic on the field. However, as FPGA is only an hardware design implementation technique, the "non-recurring engineering cost" of designing the hardware logic in an HDL is still there.
Application-specific processors can be spotted in the "design space" between off-the-shelf processors, such as ARM products or Texas Instruments DSPs where functionality is described fully in software by the designer, and custom fixed function hardware accelerators where functionality is described fully in an hardware description language by the designer. In case of ASIPs, the engineer is able to design both software and hardware at the same time (co-design) and is free to the choose the level of application specialization applied to the processor.
What can be customized in an ASIP depends on the used processor template. A commonly customized part of the processor is the instruction set. Customizable instruction set allows the designer to define new application specific instructions for the processor to implement the desired functionality faster than a set of basic software operations such as additions or shifts can. Examples of such special instructions include complex arithmetic, non-standard floating point arithmetic, application-specific precision fixed point arithmetic, more than two input adders, etc.
TCE places very few restrictions on the types of custom instructions that can be added to the processor design. For example, there are no limits to the number of input operands or produced results nor the number of clock cycles the operation can execute. In addition to the custom operations, the number and size of the register files (RF), the number and type of the functional units (FU) and the connectivity between the RFs and FUs can be freely customized.
About Transport Triggered Architectures
TCE is based on a simple but scalable architecture template called Transport Triggered Architecture (TTA, not to be confused with "Time Triggered Architecture"). TTA can be described as an exposed datapath VLIW architecture. The "exposed datapath" part means that the data transports that take place between the functional units (e.g. arithmetic logic unit or a multiplier) and the register files are explicitly visible to the programmer. In other words, while processors are commonly programmed by defining which operations to execute (including information of the sources of the operands and the destinations of the results), TTA is programmed by defining the transports of the operands and results. The name TTA comes from the way operations are executed: when operand data is moved to the triggering port of the functional unit, the operation starts executing. After a fixed latency (from the architecture point of view) the results can be read from the output ports of the functional unit to the next destination.
The programming model can be illustrated more easily with an assembly code snippet example.
- Traditional "operation triggered" (first parameter is the
- ADD R1, R2, R3
- MUL R4, R1, R5
- Transport triggered
- R2 -> ADD.OPERAND, R3 -> ADD.TRIGGER
- ADD.RESULT -> R1
- R1 -> MUL.OPERAND, R5 -> MUL.TRIGGER
- MUL.RESULT -> R4
Transport programming enables some specific software optimizations such as software (register) bypassing which in turn can enable "dead result read elimination", that can be applied in case all result reads can be transported directly to the destination functional units. This can lead to reduced register (file) pressure. An example follows:
- Transport triggered with software bypassing and dead result read
- R2 -> ADD.OPERAND, R3 -> ADD.TRIGGER
- ADD.RESULT -> MUL.OPERAND, R5 -> MUL.TRIGGER
- MUL.RESULT -> R4
Here the result from the adder is copied directly to the operand input port of the multiplier, thus removing the need for the general purpose register R1 used as a temporary storage. In addition to reducing register pressure, the freedom to schedule the movement of operand/result data in multiple cycles reduces the register file port pressure, which is one of the TTA's main motivations. In the traditional VLIW the number of RF ports needs to be scaled according to the number of connected functional units and their worst case RF port requirements (maximum simultaneous operand reads and result writes), leading to more complex RFs with higher delay and area.
Toolset assisted processor design with TCE
Designing new processors from the scratch is not a straightforward task. One needs to take care of the design, verification and porting a high level language programming toolchain for each of the processors so the programmers are happy (writing peculiar assembler syntax for a changing target would get quite depressing quickly!). Thus, the design process should be automated as fully as possible to make experimenting with different processor architecture alternatives feasible.
The ultimate goal for an ASIP design toolset is to be as easy to use as taking a high-level language program as input and producing as a result an optimal processor implementation in VHDL or Verilog. It should parallelize the program for the processor's resources efficiently while exploiting custom instructions intelligently without any user intervention. In our experience, this type of fully automated "design space exploration" tends not to produce good enough results as the codesign process is often something that a human can do more efficiently. For example, sometimes the software needs to be refactored to a form that can exploit instruction level parallelism better. Sometimes it can be hard, or even impossible, for a software algorithm to realize that a complex-looking loop can be replaced with a simple single cycle custom instruction if implemented in hardware, and so on. Thus, we see that the realistic use case for an ASIP design toolset is to assist in the design task as much as possible while still leaving leeway for the engineer to exploit their knowledge in the field of algorithms or hardware design. This way, in case the engineer is skilled enough and the toolset assisting the ASIP design task is flexible enough, the processor design can eventually reach the performance of a fixed function hardware accelerator, while the design process can also be stopped at any point when the result is good enough.
FPGA support in TCE
Latest addition to TCE toolset is a tool called Platform Integrator which automates the processor integration to FPGA. Normally TCE's Processor Generator only creates RTL implementation of the TTA processor core. But to get the processor running on FPGA one needs to interface the core with the target FPGA board. For example one would need to connect the core to memory chip pins or create and connect onchip memory components, map I/O component signals to appropriate FPGA pins and so on. Every designer knows that this can be a tedious process.
This is where the Platform Integrator steps in. It can save the designer from the dull intergration work and perform the pin mapping, memory interfacing and even create the synthesis tool project files for you. All you need to do is synthesize the design and launch it on FPGA.
Adding support for automatic integration to a new platform is done by creating a Platform Integrator module for the target FPGA board. This module contains information about the FPGA device as well as pin mapping information. In addition this module also needs a matching Hardware Database which contains target FPGA specific implementations of function units. For example, the Hardware Database can contain function units to interface with the I/O devices on the target board, such as leds, buttons, DAC chips, memory devices and so on.
Current version of TCE has Platform Intergration support for Stratix II DSP Pro FPGA board and Altera's Quartus II synthesis tools. Adding support for other Altera devices should be a fairly easy task. Hopefully we will be able to support more FPGA boards and different FPGA vendor tools in the future. Contributions to this effort are more than welcome!
However, a standalone TTA processor on FPGA is not always the desired design approach and often a system-on-a-chip (SoC) with several IP blocks is needed. This case is handled with Platform Integrator's support for producing IP-blocks from the designed TTA processor cores. For example, if the TTA processor includes a function unit that implements the Avalon Memory Mapped Master interface, Platform Integrator can create a SOPC Builder component from the processor which then can be easily used in any future SoC design created using the Altera tools.
Current status of TCE
TCE is at a relatively mature state, providing graphical tools to design the TTA architectures, architecture description driven instruction set simulators, a retargetable compiler, and a processor implementation generator supporting VHDL output. Because TCE uses TTA, a static ILP architecture, as its processor template, the efficiency of the end result is highly dependent on an efficient compiler. The compiler, which is currently based on LLVM, has been our main focus in recent years and will most likely be in the future also.
There was an 1.3 release of the open sourced toolset (MIT licensed) in November with support for LLVM 2.8, initial SystemC integration API and other fancy stuff. Go ahead and try it out!
Currently we are looking into GPGPU-style workload compilation issues. We are experimenting with OpenCL to describe the applications for easier extraction of parallelism while still providing clean support for calling custom operations from the kernel code. One of the subgoals of this work is to produce a scalable and efficient GPU implementation based on the TTA approach.
On top of the neverending journey of code generation improvements, there is also work ongoing to extend TCE to better support task level parallelism with manycore ASIP generation and compiler assisted multithreading.
In case you got interested in the project or have questions to ask, please join the mailing list firstname.lastname@example.org.
Pekka Jääskeläinen, a researcher that has been working in the TCE project from the start, and Otto Esko, the TCE FPGA guru
Update from OC-Team
This topic gives you an update of what has been "cooking" at the OpenCores community during the last month.
This month activities:
- Fixed an email bug that occurred during the webserver location switch.
- Optimized the project-page code in order to decrease webserver load.
- The new servers are running perfectly.
Our message to the community:
- Please make sure your personal information under the "My Account"-page, is up to date.
Marcus Erlandsson, ORSoC
Here you will see interesting new projects that have reached the first stage of development.
myBlaze is a synthesizable clone of the MicroBlaze Soft Processor written in myHDL ( http://www.myhdl.org ). It started as a translation of MB-Lite from VHDL to myHDL, along with a simple emulator. Its minimal configuration was tested on the Spartan-3E Starter Kit.
Development status: Mature
Nov 21, 2010: fixed hyper-link
Nov 18, 2010: initial version
Power Supply Sequencer
Large electronic systems often use multiple supply voltages that must come up and go down in a specified order. Also, it must be made sure that the system is not powered up only partly for a prolonged time. This power sequencer is composed of equal slices, one for each supply stage.
Development status: Beta
ct 27, 2010: Uploaded source, tests and description
Programmable Interrupt Controller
“pic” is a soft core, programmable interrupt controller which can be used as an interface between peripheral interrupt lines and processor IRQ line. One of the popular PIC available in market is Intel 8259. This core is not compatible with 8259. The core was designed based on my ideas of how a PIC operates and its requirements. The first version is a really basic core which can take 8 interrupts as input. The interrupt detecting methods currently supported are polling method and fixed priority method.
A testbench code is provided with the core testing the design under both the modes.simulate the core using the given testbench and you should be able to know the signalling in the core.
Development status: Alpha
Oct 27, 2010: Description updated.Added documentation.
Oct 27, 2010: SVN uploaded with the alpha version of the PIC.
Johan Rilegård, ORSoC