Newsletter February 2013
"OpenRISC inside" - Commercial product based on the OpenRISC processor
DDR, Disk Driver Replacement
The DDR (Disk Drive Replacement) is a product solving problems with insolvent Disk Drives for the industry. This specific device is aimed for the Robot industry and is designed to make sure no changes needs to be done at the host device (the Robot). It is developed by ORSoC and is owned and sold by Swerob.
Function and benefit overview
• Program, backups, parameters and boot disks are stored on a single SD card and operate according to the same principle as a hard drive for the robot system. It is a major benefit having a boot pack that is always available.
• Backup and reset of the entire SD card on PC or network. This was never possible before in S2 or S3 systems with old drives
• File management and editing of programs in S4, S4C and S4C+ (with DSQC540 computer) can be executed as usual in a PC using "DDR PC Software"
• Possibility of using the old floppy disk unit in parallel with DDR in S3, thanks to a special switch cable. This makes it easy when transferring programs and parameters
• Environmental endurance and safe storage of data in a CE-marked product
• Installation in just a few minutes
• Very low purchase price
An entirely new, faster generation of DDR now also works in S4C+ systems. This means that DDR replaces all old floppy disk units in ABB robot systems (S2, S3, S4, S4C and S4C+ with DSQC540 computer).
The folowing link is a promotion video for this product (only in Swedish at the moment) DDR promotion video.
ORSoC has been responsible for the whole development project, providing FPGA design, PCB design and Software design. The product is based on the OpenRISC platform designed by ORSoC. The design is customized with minor changes and suitable software is added.
Information from ORSoC about this product and its usage:
This is another good example of how design based on IPs from OpenCores will assure well designed and cost efficient product development and production. The last couple of years ORSoC have been responsible for many projects where we handle most of the development for our customers’ products. We are very pleased that our customers appreciate our design expertise and huge experience from designs based on IPs from OpenCores. We are also running some long time support projects. We will try to present as many of these projects a possible in the coming OpenCores newsletters. If you have any questions about how we can support in development projects, do not hesitate to contact us: firstname.lastname@example.org, +46 8 24 84 04
Article written by ORSoC in cooperation with Swerob, ORSoC
Update from OC-Team
This topic gives you an update of what has been "cooking" at the OpenCores community during the last month.
This month activities:
- No issue
- No issues
Our message to the community:
- If you didn't already know, Embedded World is next week!!!.....check it out if you have the chance.
- Help us improve the community, please provide feedback
Marcus Erlandsson, ORSoC
Here you will see interesting new projects that have reached the first stage of development.
cr_div - Cached Reciprocal Divider
This core is a low latency divider that works by caching reciprocal values, then using a multiply to perform the divide rather than the usual divide operation. On first encountering a divide operation the reciprocal of the divisor is calculated, this takes the same amount of time as a normal divide. The next time the same divide is encountered the pre-calculated reciprocal is used. Reciprocals are stored in a small cache similar to a processor data cache.
a/b is the same as a * 1/b
In many cases the divisor 'b' remains the same within a loop. 1/b can be calculated to be essentially a constant; then all that's required is a multiply operation. As in the example, divides are performed using only three clock cycles when the reciprocal can be found in the cache.
Development status: Alpha
Feb 12, 2013: Updating project description - initial project setup
A fault tolerant for processor
The mips – fault tolerant is mips 32 bits processor with error detection ( Fault Tolerant ). The processor implementation was designed by Lazaridis Dimitris. Main aspects
The core is in 5 stages:
- Instruction extraction
- Instruction decoding
- Memory access
- Update registers
It supports almost all instructions of mips technology, R type, I type, Branch, Jump and multiply packet instructions.
The multiply result is stored until is needed regardless if others instructions follows.
There is an error detection circuits for fault tolerant. It is implementing in hardware 100% which provides error detection at reset start-up.
There is a separate memory for instructions and another for data read – write which can be changed.
At each stage one clock cycle is used. Both memories function in descending pulse and the remaining pulse is used for developing the necessary functions (e.g. pipeline), which makes the core faster and more flexible.
All I types instructions are part decoded in first stage and all R types also part decoded in Alu control reducing the complexity in main Control unit (FSM).
All instructions are tested for correct execution. A test benchs from separate circuit implementation is also included (to verify the program which exists in Instruction memory).
The mips - fault tolerant was integrated in an FPGA from Xilinx version 13.1 in Spartan 3 xc3s400-5tq144 target device but can be fit in another similar target device.
The processor is implemented all in VHDL.
With continuous scaling in CMOS technology the number of transistors grows more and more in a single chip. Chip multiprocessors (CMPs) are an efficient way for using this very large number of transistors integrated in a chip. Several researches show that high density integration makes modern processors prone to the risk of transient or permanent fault. However, the increase of temperature and decrease of the voltage in the chip lead to a higher susceptibility to faults. As the feature size shrinks the probability of a single transistor to become faulty, it increases due to the low threshold voltages. It is projected that the rate at which the transient errors occur will grow exponentially and will soon represent one of the most significant issues in the design of future generation high-performance microprocessors. This work proposes a fault tolerant architecture that tolerates the high fault rates that are expected in future technologies. In this work the multiplication block circuit is tested.
Analyze In this method a multiplication is executed and the result is stored following by a comparison. It is start with initial value of 00001111…. which this value executes a multiplication in multiplication circuit and the result is stored. It needs 64 machine cycles to complete this error detection. After the initial multiplication the numbers which are executed are subjected a shift one digit, following by a multiplication again and the result are stored in previous result. This is continuous for 64 machine cycle, where the final result is stored, including the previous results. In final stage of error detection the calculated result is compared with a correct stored result and if any error exists in multiplication array circuit this can be found. In this method the fault coverage is 85% and 64 machine cycles are demanded. The error detection begins at start up before any execution. It has a high fault coverage and nearly fast execution due to hardware implementation, which will be more popular method for errors detection in future for the time saving (there is not time penalty), reliability, low cost and high presentence to fault coverage, low power consumption. Another alternate method: This method is hardware implemented 100% also and it is very simple. In first machine cycle a 0000… is executed in multiplication circuit and the result is compared with 0, if a 1 stack exists can be found here. In second cycle a number 1111111111111111111111111111111 is multiplied with the 10 and comparison is done with a correct stored value at the end of this cycle. The third and final stage a multiplication is done with reversed numbers to cover as much as possible of the multiplication array circuit and if any error detection exist is also found here. This method has smaller fault coverage 61% but it is very fast, it is only need 3 machines cycles to complete the fault tolerance.
Further research Most error detect methods for fault tolerance check the mips or a circuit at start up or at once or periodically to find any errors for fault coverage, but what if an error occurs during the tests? A fault data will process as correct. To work around with this, a non stop searching method test the mips continuously, it can be implement and find any error as it appears at the beginning and further more if the fpga has enough room to relocate the damaged place in another undamaged. To implement this error detect method, we can inject in fsm and detect the errors for fault tolerance. Knowing the next stage (instruction) through fsm, and decide to test the “multiply” block circuits, could test the multiply circuits until the next instruction it is not concern this circuits, if a multiply instruction is coming up we can stop the process and continue when it is free again, thus we can find if an error occurs in this part of cpu and cover the fault tolerance. The same process it is possible to test all critical parts of mips or central unit and find if an error exits. The advantage in this method is that the error detects circuit works continuously. This method does not require double cores, but only some additional parts (low cost) and which can work in conjunction with fsm without consume the microprocessor’s working time but it works simultaneously.
Development status: Stable
Feb 10, 2013: description
Feb 6, 2013: done
Feb 4, 2013: description
Feb 1, 2013: description
Jan 28, 2013: descreption
Jan 28, 2013: description
rtfSpriteController / Hardware cursors
This core provide hardware cursor / sprite capabilities. It supports alpha blending in the 32k color mode. The cursor characteristics are completely programmable.
- parameterized number of sprites/cursors 1,2,4,6,8 or 14
- 2kB sprite image cache buffers
- each image cache is capable of holding multiple sprite images
- cache may be accessed like a memory by the processor
- an embedded DMA controller may also be used for sprite reload
- programmable image offset within cache
- programmable sprite width,height, and pixel size
- sprite width and height may vary from 1 to 64 as long as the product doesn't exceed 1024.
- pixels may be programmed to be 1,2,3 or 4 video clocks both height and width are programmable
- programmable sprite position
- 8 or 16 bits for color eg 32k color + 1 bit alpha blending indicator (1,5,5,5)
- fixed display and DMA priority sprite 0 highest, sprite 13 lowest
The core is currently being tested on an Atlys board and appears to be working.
Development status: Apha
Feb 4, 2013 - intial project update
Johan Rilegård, ORSoC
First FPGAs from Achronix
The American FPGA challenger Achronix has delivered the first chips in the new FPGA family Speedster 22i. The chips are fabricated in Intel's 22 nm process with FinFET transistors.
The news that Intel would help the startup company Achronix made the headlines in November 2010. Officially Intel provides no foundry services and even if Achronix will succeed, the the revenue for Intel would be tiny. A speculation was that Intel would eventually acquire Achronix, something that has not yet happened.
Now, at last, the first chips are available. It's engineering samples, ie circuits that work but are not tested fully.
First up is the Speedster 22i HD1000, a device with 700,000 programmable lookup tables designed for communication products including 100 Gbps Ethernet.
Besides the programmable logic the circuit has hard blocks for 10/40/100G Ethernet, 100Gbps Interlaken, PCI Express Gen1/2/3 and DMA access for DDR3 memory.
The hard blocks correspond to around 300 000 lookup tables why Achronix argue that HD1000 is equivalent to one million lookup tables.
Furthermore, there are 86 Mbits of RAM and 960 programmable inputs and outputs, as well as 64 Serdesblock at 12.75 Gbit / s
Achronix argue that Speedster 22i only draws half as much power as the closest competitors. For HD1000, it means that the static power consumption is 3.6 W. For dynamic power consumption ( how much power the circuit energized when working) the 22 nm process gives an improvement of 20 percent compared with 28 nm Xilinx and Altera, says the company's marketing manager Steve Mensor.
In addition, the Speedster has hard blocks for many of the communication protocols which saves additional power.
– Two 100 Gbit Ethernet, two 100 Gbps Interlaken, PCIe and six DDR3 memory accesses, consumes only 2 watts. Equivalent FPGAs need between 15 and 20 watts because you have to implement the functions in logic, says Steve Mensor to Elektroniktidningen.
The company has begun delivering samples to selected customers and volume production is scheduled for the beginning of the third quarter.
The list price for one HD1000 is 2,900 dollars.
– Equivalent FPGAs have a list price that is two to three times higher. If you buy larger quantities, you get a lower price from both us and the competition.
Achronix also supplies a development board for PCI Express and associated development tools, programming pod and power supply. The card can run independently or inserted into a computer.
The price for the evaluation package is 13,000 dollars.
Published by Elektroniktidningen at http://etn.se/57201