Subversion Repositories mpeg2fpga

[/] [mpeg2fpga/] [trunk/] [doc/] [mpeg2fpga.txt] - Rev 2

Compare with Previous | Blame | View Log

MPEG-2 Decoder User Guide

Koenraad De Vleeschauwer

  Copyright Notice

Copyright 2007-2009, Koenraad De Vleeschauwer. 

Redistribution and use in source (LyX format) and `compiled' 
forms (PDF, PostScript, HTML, RTF, etc.), with or without 
modification, are permitted provided that the following 
conditions are met:

1. Redistributions of source code (LyX format) must retain the 
  above copyright notice, this list of conditions and the 
  following disclaimer.

2. Redistributions in compiled form (transformed to other DTDs, 
  converted to PDF, PostScript, HTML, RTF, and other formats) 
  must reproduce the above copyright notice, this list of 
  conditions and the following disclaimer in the documentation 
  and/or other materials provided with the distribution.

3. The name of the author may not be used to endorse or promote 
  products derived from this documentation without specific prior 
  written permission.

This documentation is provided by the author “as is" and any 
express or implied warranties, including, but not limited to, the 
implied warranties of merchantability and fitness for a 
particular purpose are disclaimed. In no event shall the author 
be liable for any direct, indirect, incidental, special, 
exemplary, or consequential damages (including, but not limited 
to, procurement of substitute goods or services; loss of use, 
data, or profits; or business interruption) however caused and on 
any theory of liability, whether in contract, strict liability, 
or tort (including negligence or otherwise) arising in any way 
out of the use of this documentation, even if advised of the 
possibility of such damage.

  MPEG-2 License Notice

Commercial implementations of MPEG-1 and MPEG-2 video, including 
shareware, are subject to royalty fees to patent holders. Many of 
these patents are general enough such that they are unavoidable 
regardless of implementation design.

MPEG-2 intermediate product. Use of this product in any manner 
that complies with the MPEG-2 standard is expressly prohibited 
without a license under applicable patents in the MPEG-2 patent 
portfolio, which license is available from MPEG LA, L.L.C., 250 
Stelle Street, suite 300, Denver, Colorado 80206.

Table of Contents

Copyright Notice
MPEG-2 License Notice
Chapter 1 Processor Interface
1.1 Decoder Block Diagram
1.2 Ports
1.2.1 Clocks
1.2.2 Reset
1.2.3 Stream Input
1.2.4 Register File Access
1.2.5 Memory Controller
1.2.6 Memory Request FIFO
1.2.7 Memory Response FIFO
1.2.8 Video Output
1.2.9 Test Point
1.2.10 Status
1.3 Processor Tasks
1.4 Registers
1.5 Read-only Registers
1.6 On-Screen Display
1.7 Frame Store
1.8 Video Modeline
1.9 Interrupts
1.10 Watchdog
1.11 Trick mode
1.12 Test point
Chapter 2 Decoder Sources
2.1 Source Directory Structure
2.2 MPEG2 Decoder
2.2.1 FIFO sizes
2.2.2 Dual-ported memory and FIFO models
2.2.3 Memory mapping
2.2.4 Modeline
2.2.5 Inverse Discrete Cosine Transform
2.2.6 Bilinear chroma upsampling
2.3 Simulation
2.3.1 Icarus Verilog Simulation
2.3.2 Conformance Tests
2.4 Tools
2.4.1 Logic Analyzer
2.4.2 Finite State Machine Graphs
2.4.3 IEEE-1180 IDCT Accuracy Test
2.4.4 Reference software decoder
2.4.5 MPEG2 Test Streams

<cha:Processor-Interface>Processor Interface

An MPEG2 decoder, implemented in Verilog, is presented. Chapter [cha:Processor-Interface]
 describes the decoder for the software engineer who wishes to 
write a device driver.

1.1 Decoder Block Diagram

[float Figure:
<Graphics file: /home/user/src/xilinx/mpeg2fpga/doc/>

[Figure 1.1: 
<fig:Decoder-Block-Diagram>Decoder Block Diagram

Figure [fig:Decoder-Block-Diagram] shows the MPEG2 decoder block 
diagram. An external source such as a DVB tuner or DVD drive 
provides an MPEG2 stream. The video elementary stream is 
extracted and sent to the decoder. The video buffer acts as a 
fifo between the incoming MPEG2 video stream and the variable 
length decoder. The video buffer evens out temporary differences 
between the bitrate of the incoming MPEG2 bitstream and the 
bitrate at which the decoder parses the bitstream.

The MPEG2 codec is a variable length codec; codewords which occur 
often occupy less bits than codewords which occur only rarely. 
Getbits provides a sliding window over the incoming stream. As 
the codewords have a variable length, the sliding window moves 
forward a variable amount of bits at a time.

Variable length decoding does the actual parsing of the 
bitstream. Variable length decoding stores stream parameters such 
as horizontal and vertical resolution, and produces run/length 
values and motion vectors. Run/length values and motion vectors 
are different ways of describing an image. The run/length values 
describe an image as compressed data contained within the 
bitstream. The motion vectors describe an image as a mosaic of 
already decoded images.

Run-length decoding, inverse quantizing and inverse discrete 
cosine transform decompress the run/length values.

Motion compensation retrieves already decoded images from memory 
and applies the motion vector translations.

The reconstructed image is the sum of the decompressed run/length 
values and translated pieces of already decoded images. The 
reconstructed image is stored in the frame store for later 
display and reference.

The frame store receives requests to store and retrieve pixels 
from three different sources: 

• motion compensation, which writes reconstructed image frames to 

• chroma resampling, which reads reconstructed image frames from 
  memory for displaying

• writes to the on-screen display, under software control.

Some of these blocks have multiple accesses to the frame store. 
Within the MPEG2 decoder a total of six memory read or write 
requests may occur simultaneously. The frame store prioritizes 
these requests and serializes them into a single stream of memory 
read/write requests, which is sent to the memory controller. 

The memory controller is external to the MPEG2 decoder. The 
memory controller handles the low-level details of interfacing 
with the memory chips. If memory is static RAM, interfacing 
requires little more than a buffer; dynamic memory requires a 
more complex controller.

The MPEG2 decoder accepts 4:2:0 format video, in which color and 
brightness information have a different resolution: color 
information (chrominance) is sent at half the horizontal and half 
the vertical resolution of brightness information (luminance). 
This makes sense because the human eye uses different mechanisms 
to perceive color and brightness; and the different mechanisms 
used have different sensitivities. 

Sending color information at half the horizontal and half the 
vertical resolution of brightness information implies the 
reconstructed image in the frame store has only one color pixel 
for every four brightness pixels. Assigning the same color 
information to the four pixels of brightness information would 
result in a chunky image. Chroma resampling does horizontal and 
vertical interpolation of the color information, resulting in a 
smooth color image.

A dot clock marks the frequency at which pixels are sent to the 
display. The dot clock is external to the MPEG2 decoder and can 
be either free running or synchronized to another clock.

The video synchronization generator counts pixels, lines and 
image frames at the dot clock frequency. At any given moment, the 
video synchronization generator knows the horizontal and vertical 
coordinate of the pixel to be displayed. 

The pixels generated in chroma resampling and the coordinates 
generated by the video synchronization generator are joined in 
the mixer. The result is a stream of pixels, at the current 
horizontal/vertical coordinate, at the dot clock frequency. 

At this point the on-screen display is added. The on-screen 
display has the same resolution as the video and uses a 256-color 
palette. Software can choose to put the on-screen display on top, 
completely hiding the video; or to blend on-screen display and 
video, as if they were two translucent glass plates.

The MPEG2 decoder works with chrominance (color) and luminance 
(brightness) information throughout. The final step is converting 
chrominance and luminance to red, green and blue in yuv2rgb. The 
red, green and blue information is the output of the decoder.

1.2 Ports

Table [tab:Ports] lists MPEG2 decoder input/output ports.[float Table:

|          Port           | Bits  |          Description           | I/O  |  Clock  |
|          clk            |  1    |         Decoder clock          |  I   |    -    |
|        dot_clk          |  1    |          Video clock           |  I   |    -    |
|        mem_clk          |  1    |    Memory Controller clock     |  I   |    -    |
|          rst            |  1    |             Reset              |  I   |    -    |
|      stream_data        |  8    |      Program stream data       |  I   |   clk   |
|      stream_valid       |  1    |       stream_data valid        |  I   |   clk   |
|          busy           |  1    |       Decoder busy flag        |  O   |   clk   |
|        reg_addr         |  4    |       Register address         |  I   |   clk   |
|       reg_dta_in        |  32   |      Register write data       |  I   |   clk   |
|       reg_wr_en         |  1    |     Register write enable      |  I   |   clk   |
|      reg_dta_out        |  32   |      Register read data        |  O   |   clk   |
|       reg_rd_en         |  1    |     Register read enable       |  I   |   clk   |
|         error           |  1    |      Decoding error flag       |  O   |   clk   |
|       interrupt         |  1    |           Interrupt            |  O   |   clk   |
|      watchdog_rst       |  1    |   Watchdog-generated Reset     |  O   |   clk   |
|           r             |  8    |              Red               |  O   | dot_clk |
|           g             |  8    |             Green              |  O   | dot_clk |
|           b             |  8    |             Blue               |  O   | dot_clk |
|           y             |  8    |          Y Luminance           |  O   | dot_clk |
|           u             |  8    |        Cr Chrominance          |  O   | dot_clk |
|           v             |  8    |        Cb Chrominance          |  O   | dot_clk |
|        pixel_en         |  1    |         Pixel enable           |  O   | dot_clk |
|         h_sync          |  1    |  Horizontal synchronization    |  O   | dot_clk |
|         v_sync          |  1    |   Vertical synchronization     |  O   | dot_clk |
|         c_sync          |  1    |   Composite synchronization    |  O   | dot_clk |
|     mem_req_rd_cmd      |  2    |    Memory request command      |  O   | mem_clk |
|    mem_req_rd_addr      |  22   |    Memory request address      |  O   | mem_clk |
|     mem_req_rd_dta      |  64   |      Memory request data       |  O   | mem_clk |
|     mem_req_rd_en       |  1    |  Memory request read enable    |  I   | mem_clk |
|    mem_req_rd_valid     |  1    |     Memory request valid       |  O   | mem_clk |
|     mem_res_wr_dta      |  64   |     Memory response data       |  I   | mem_clk |
|     mem_res_wr_en       |  1    |    Memory response enable      |  I   | mem_clk |
| mem_res_wr_almost_full  |  1    |  Memory response almost full   |  O   | mem_clk |
|    testpoint_dip_en     |  1    | Testpoint dip switches enable  |  I   |    -    |
|     testpoint_dip       |  4    |    Testpoint dip switches      |  I   |    -    |
|       testpoint         |  34   |  Logical analyzer test point   |  O   |    -    |

[Table 1.2: 

1.2.1 Clocks

Up to three different clocks may be supplied to the MPEG2 

clk Main decoder clock, input. 

dot_clk Video clock, input.  Variable frequency, varying with 
current video modeline. 

mem_clk Memory Controller Clock, input. 

The decoder produces pixels at a maximum rate of one per clk 

1.2.2 Reset

rst Asynchronous reset, input, active low, internally 

1.2.3 Stream Input

stream_data 8-bit elementary stream data, input, synchronous with 
 clk, byte aligned. The elementary stream is an MPEG2 4:2:0 video 
elementary stream.

stream_valid elementary stream data valid, input, synchronous 
with clk. Assert when stream_data valid.

busy busy, active high, output, synchronous with clk. When high, 
indicates maintaining stream_valid high will overflow decoder 
input buffers.

1.2.4 Register File Access

reg_addr 5-bit register address, input, synchronous with clk.

reg_dta_in 32-bit register data in, input, synchronous with clk.

reg_wr_en register write enable, input, active high, synchronous 
with clk. Assert to write reg_dta_in to reg_addr.

reg_dta_out 32-bit register data out, output, synchronous with 

reg_rd_en Active high register read enable, input, synchronous 
with clk. Assert to obtain the contents of register reg_addr at 

1.2.5 Memory Controller

The interface between MPEG2 decoder and memory controller 
consists of two fifos. The memory request FIFO sends memory read, 
write or refresh requests from decoder to memory controller. The 
memory response FIFO sends data read from memory controller to 
MPEG2 decoder. The data from the memory read requests appears in 
the memory response FIFO in the same order as the memory reads 
were issued in the memory request FIFO.

1.2.6 Memory Request FIFO

mem_req_rd_cmd memory request command, output, synchronous with 
mem_clk. Valid values are defined in table [tab:Memory-controller-commands]
. [float Table:

| mem_req_rd_cmd  |  Mnemonic    |    Description     |
|       0         |  CMD_NOOP    |    No operation    |
|       1         | CMD_REFRESH  |   Refresh memory   |
|       2         |  CMD_READ    |  Read 64-bit word  |
|       3         |  CMD_WRITE   | Write 64-bit word  |

[Table 1.3: 
<tab:Memory-controller-commands>Memory controller commands

mem_req_rd_addr 22-bit memory request address, output, 
synchronous with  mem_clk.

mem_req_rd_dta 64-bit memory request data, output, synchronous 
with mem_clk.

mem_req_rd_en memory request read enable, input, active high, 
synchronous  with mem_clk.

mem_req_rd_valid memory request read valid, output, active high, 
synchronous  with mem_clk. Indicates when mem_req_rd_cmd, 
mem_req_rd_addr and mem_req_rd_dta have meaningful values.

1.2.7 Memory Response FIFO

mem_res_wr_dta 64-bit memory response write data, input, 
synchronous with  mem_clk.

mem_res_wr_en memory response write enable, input, active high, 
synchronous  with mem_clk. Assert to write mem_res_wr_dta to the 
memory response FIFO. 

mem_res_wr_almost_full memory response write almost full, output, 
active  high, synchronous with mem_clk. When high, indicates 
maintaining mem_res_wr_en high will overflow the memory response 
FIFO. The current clock cycle can be completed without 
overflowing the memory response FIFO.

1.2.8 Video Output

r red component, output, synchronous with dot_clk.

g green component, output, synchronous with dot_clk.

b blue component, output, synchronous with dot_clk.

y Y luminance, output, synchronous with dot_clk.

u Cr chrominance, output, synchronous with dot_clk.

v Cb chrominance, output, synchronous with dot_clk.

pixel_en pixel enable, output, active high, synchronous with 
dot_clk. When pixel_en is high, r, g, b, y, u and v are valid; 
when pixel_en is low video is blanked.

h_sync horizontal synchronization, output, active high, 
synchronous with  dot_clk.

v_sync vertical synchronization, output, active high, synchronous 
with dot_clk.

c_sync composite synchronization, output, active low, synchronous 
with dot_clk.

1.2.9 Test Point

The decoder provides a test point for connecting a logic 
analyzer. The signals available at the test point can be selected 
either by software control, or using dip switches. The signals 
available at the test point are not defined as part of this 
specification, may vary even for implementations with the same 
status register version number and are subject to change without 
notice. See Verilog source probe.v for details.

testpoint_dip_en 1-bit input. If testpoint_dip_en is high, the 
registers visible at testpoint are selected using testpoint_dip. 
If testpoint_dip_en is low, the registers visible at testpoint 
output are selected using the testpoint_sel field of register 15.

testpoint_dip  4-bit input. testpoint_dip selects test point 
output if testpoint_dip_en is high.

testpoint  34-bit output. testpoint is a test point to connect a 
34-channel logic analyzer probe to the MPEG2 decoder. Up to 16 
different sets of signals are available, hardware selectable 
using the testpoint_dip dip switches or software selectable by 
writing to register 15. Any clocks present are on bits 32 and/or 
33; bits 0 to 31 are data only. Bits 0 to 31 can also be accessed 
by software, by reading register 15. 

1.2.10 Status

error error, output, active high, synchronous with clk. Indicates 
variable length decoding encountered an error in the bitstream.

interrupt interrupt, output, active high, synchronous with clk. 
Reading the status register allows software to determine the 
cause of the interrupt, and will clear the interrupt.

watchdog_rst watchdog-generated reset signal, output, active low, 
synchronous with clk. Normally high; low during one clock cycle 
if the watchdog timer expires.

1.3 Processor Tasks

To decode an MPEG-2 bitstream, the processor should execute the 
following tasks, in order:

1. Initialize the horizontal, horizontal sync, vertical, vertical 
  sync and video mode registers with reasonable defaults. Clear 
  osd_enable, picture_hdr_intr_en and frame_end_intr_en. Set the 
  video_ch_intr_en flag.

2. Start feeding the MPEG-2 bitstream to the stream_data port of 
  the decoder.

3. The decoder will issue an interrupt when video resolution or 
  frame rate changes. Whenever the decoder issues an interrupt, 
  clear the interrupt by reading the status register. Read the 
  size, display size and frame rate registers. Calculate a new 
  modeline, change dot clock frequency if necessary, and write 
  the new video timing parameters to the horizontal, horizontal 
  sync, vertical, vertical sync and video mode registers.

4. At bitstream end, pad the stream with 8 times hex 000001b7, 
  the sequence end code (ISO/IEC 13818-2, par. 6.2.1, Start 

If the On-Screen Display (OSD) is used, the processor should 
execute the following tasks as well:

1. Initialize the On-Screen Display color look-up table.

2. Wait until horizontal_size and vertical_size have meaningful 

3. Write to the On-Screen Display.

4. Set osd_enable to one.

5. If a video change interrupt occurs, and horizontal_size or 
  vertical_size has changed, rewrite the On-Screen Display.

Writing to the OSD is described in detail [sec:On-Screen-Display]
. Interrupt handling is treated [sec:Interrupts].

1.4 Registers

The processor interface to the decoder consists of two times 16 
32-bit registers. These registers can be divided in 16 read-mode 
registers (Table [tab:Read-mode-Registers]) and 16 write-mode 
registers (Table [tab:Write-mode-Registers]). The read-mode 
registers allow reading decoder status, while the write-mode 
registers allow setting video timing parameters and writing to 
the On-Screen Display (OSD).[float Table:

|    |   register   || bits   |         content           | read/write |
| 0  |   version    || 15-0   |         version           |     r      |
| 1  |    status    || 15-8   |   matrix_coefficients     |     r      |
|    |              ||   7    |     watchdog_status       |     r      |
|    |              ||   6    |        osd_wr_en          |     r      |
|    |              ||   5    |        osd_wr_ack         |     r      |
|    |              ||   4    |       osd_wr_full         |     r      |
|    |              ||   3    |       picture_hdr         |     r      |
|    |              ||   2    |        frame_end          |     r      |
|    |              ||   1    |         video_ch          |     r      |
|    |              ||   0    |          error            |     r      |
| 2  |     size     || 29-16  |     horizontal_size       |     r      |
|    |              || 13-0   |      vertical_size        |     r      |
| 3  | display size || 29-16  | display_horizontal_size   |     r      |
|    |              || 13-0   |  display_vertical_size    |     r      |
| 4  |  frame rate  || 15-12  | aspect_ratio_information  |     r      |
|    |              ||  11    |   progressive_sequence    |     r      |
|    |              || 10-6   |  frame_rate_extension_d   |     r      |
|    |              ||  5-4   |  frame_rate_extension_n   |     r      |
|    |              ||  3-0   |     frame_rate_code       |     r      |
| f  |  testpoint   || 31-0   |        testpoint          |     r      |

[Table 1.4: 
<tab:Read-mode-Registers>Read-mode Registers

][float Table:

|    |    register      | bits   |        content         | read/write |
| 0  |     stream       | 15-8   |   watchdog_interval    |     w      |
|    |                  |   3    |      osd_enable        |     w      |
|    |                  |   2    |  picture_hdr_intr_en   |     w      |
|    |                  |   1    |   frame_end_intr_en    |     w      |
|    |                  |   0    |   video_ch_intr_en     |     w      |
| 1  |   horizontal     | 27-16  | horizontal_resolution  |     w      |
|    |                  | 11-0   |   horizontal_length    |     w      |
| 2  | horizontal sync  | 27-16  | horizontal_sync_start  |     w      |
|    |                  | 11-0   |  horizontal_sync_end   |     w      |
| 3  |    vertical      | 27-16  |  vertical_resolution   |     w      |
|    |                  | 11-0   |    vertical_length     |     w      |
| 4  |  vertical sync   | 27-16  |  vertical_sync_start   |     w      |
|    |                  | 11-0   |   vertical_sync_end    |     w      |
| 5  |   video mode     | 27-16  |  horizontal_halfline   |     w      |
|    |                  |   2    |   clip_display_size    |     w      |
|    |                  |   1    |   pixel_repetition     |     w      |
|    |                  |   0    |      interlaced        |     w      |
| 6  |  osd clt yuvm    | 31-24  |           y            |     w      |
|    |                  | 23-16  |           u            |     w      |
|    |                  | 15-8   |           v            |     w      |
|    |                  |  7-0   |     osd_clt_mode       |     w      |
| 7  |  osd clt addr    |  7-0   |     osd_clt_addr       |     w      |
| 8  |  osd dta high    | 31-0   |     osd_dta_high       |     w      |
| 9  |   osd dta low    | 31-0   |      osd_dta_low       |     w      |
| a  |    osd_addr      | 31-29  |       osd_frame        |     w      |
|    |                  | 28-27  |       osd_comp         |     w      |
|    |                  | 26-16  |      osd_addr_x        |     w      |
|    |                  | 10-0   |      osd_addr_y        |     w      |
| b  |   trick mode     |  10    |      deinterlace       |     w      |
|    |                  |  9-5   |     repeat_frame       |     w      |
|    |                  |   4    |      persistence       |     w      |
|    |                  |  3-1   |     source_select      |     w      |
|    |                  |   0    |      flush_vbuf        |     w      |
| f  |    testpoint     |  3-0   |     testpoint_sel      |     w      |

[Table 1.5: 
<tab:Write-mode-Registers>Write-mode Registers

1.5 Read-only Registers

version contains a non-zero FPGA bitstream (hardware) version 
number.  Software should at least print a warning “Warning: 
hardware version (%i.%i) more recent than software driver” if the 
hardware version is higher than expected.

picture_hdr is set whenever an picture header is encountered in 
the bitstream.  picture_hdr is cleared whenever the status 
register is read. In a well-behaved MPEG-2 stream, 
horizontal_size, vertical_size, display_horizontal_size, 
display_vertical_size, aspect_ratio_information and frame_rate 
will have meaningful values when a picture header is encountered.

frame_end is set when video vertical synchronization begins.  
frame_end is cleared whenever the status register is read.

video_ch is set whenever video resolution or frame rate changes.  
video_ch is cleared whenever the status register is read.

error is set when variable length decoding cannot parse the 
bitstream.  error is cleared whenever the status register is 

watchdog_status is high if the watchdog timer expired. 
watchdog_status is cleared whenever the status register is read.

horizontal_size is defined in ISO/IEC 13818-2, par., 
par. 6.3.3.

vertical_size is defined in ISO/IEC 13818-2, par., par. 

display_horizontal_size is defined in ISO/IEC 13818-2, par., par. 6.3.6.

display_vertical_size is defined in ISO/IEC 13818-2, par., par. 6.3.6.

aspect_ratio_information is defined in ISO/IEC 13818-2, par.  

matrix_coefficients is defined in ISO/IEC 13818-2, par.  6.3.6.

frame_rate_extension_n is defined in ISO/IEC 13818-2, par.  
6.3.3, par. 6.3.5.

frame_rate_code is defined in ISO/IEC 13818-2, par.  6.3.3, Table 

progressive_sequence is defined in ISO/IEC 13818-2, par.  6.3.5.

frame_rate_extension_d is defined in ISO/IEC 13818-2, par.  
6.3.3, par. 6.3.5.

1.6 On-Screen Display

<sec:On-Screen-Display>The OSD has the same resolution and aspect 
ratio as the MPEG-2 video being displayed. If no MPEG-2 video is 
being displayed, the OSD is undefined. Note feeding the decoder a 
simple MPEG-2 sequence header with horizontal_size and 
vertical_size already satisfies the requirements for using the 

The OSD is only shown if there is video output. If one wishes to 
display an OSD when no MPEG2 video is being reproduced, video 
output can be forced by setting source_select to 4, 5, 6 or 7.

The OSD may use up to 256 different colors. The OSD color lookup 
table (CLT) stores y, u, v and osd_clt_mode data for each color. 
The y, u and v values are interpreted as defined by 
matrix_coefficients. The osd_clt_mode value determines the color 
displayed according to Table [tab:On-Screen-Display-Modes]. [float Table:

| osd_clt_mode  |                Comment                 |
|   xxx00000    |              alpha = 0/16              |
|   xxx00001    |              alpha = 1/16              |
|   xxx00010    |              alpha = 2/16              |
|   xxx00011    |              alpha = 3/16              |
|   xxx00100    |              alpha = 4/16              |
|   xxx00101    |              alpha = 5/16              |
|   xxx00110    |              alpha = 6/16              |
|   xxx00111    |              alpha = 7/16              |
|   xxx01000    |              alpha = 8/16              |
|   xxx01001    |              alpha = 9/16              |
|   xxx01010    |             alpha = 10/16              |
|   xxx01011    |             alpha = 11/16              |
|   xxx01100    |             alpha = 12/16              |
|   xxx01101    |             alpha = 13/16              |
|   xxx01110    |             alpha = 14/16              |
|   xxx01111    |             alpha = 15/16              |
|   xxx11111    |             alpha = 16/16              |
|   xx0xxxxx    |     attenuate video pixel by alpha     |
|   xx1xxxxx    |    alpha blend osd and video pixel     |
|   00xxxxxx    |          display video pixel           |
|   01xxxxxx    | display attenuated/alpha blended pixel |
|   10xxxxxx    |           display osd pixel            |
|   11xxxxxx    |       display blinking osd pixel       |

[Table 1.6: 
<tab:On-Screen-Display-Modes>On-Screen Display Modes
]The different modes combine osd and video in various ways:

• video. This is the normal mode of operation.

• attenuated video. 16 discrete levels of attenuation can be used 
  to fade video in or out.

• on-screen display.

• blend of on-screen display and video. 16 discrete levels of 

• blinking on-screen display. Alternates between osd pixel and 
  attenuated/alpha blended video pixel with a frequency of about 
  one second.

osd_enable determines whether the On-Screen Display is shown or 
not.  If osd_enable is low, the On-Screen Display is not shown. 
If osd_enable is high, the On-Screen Display is shown. The osd 
color lookup table has to be initialized and the osd has to be 
written before osd_enable is raised. osd_enable is 0 on power-up 
or reset. 

osd_wr_en is set whenever an osd write is has been accepted, 
whether the  osd write was successful or not. osd_wr_en is 
cleared whenever the status register is read.

osd_wr_ack is set whenever an osd write has been successful.  
osd_wr_ack is cleared whenever the status register is read.

osd_wr_full is set when the osd write fifo is full.  When the osd 
write fifo is full, osd writes are not accepted.

When writing to the osd color lookup table:

1. Write osd_clt_yuvm.

2. Write osd_clt_addr.

Writes to the osd color lookup table take effect immediately.

When writing to the osd:

1. Only write to the osd when horizontal_size and vertical_size 
  have meaningful values. This is the case when a picture header 
  has been encountered.

2. Verify osd_wr_full is low. Writing when osd_wr_full is high 
  has no effect.

3. Write the leftmost four pixels to osd_dta_high.

4. Write the rightmost four pixels to osd_dta_low.

5. Write x and y position of the leftmost pixel to osd_addr. Note 
  x has to be a multiple of 8. osd_frame always has value 4 for 
  OSD writes. osd_comp always has value 0 for OSD writes.

6. Read the status register until osd_wr_en is asserted. When 
  osd_wr_en is high, the value of osd_wr_ack indicates whether 
  the write was successful.

Writes to the osd pass through a 32-position fifo. This 
introduces some latency. Repeating the last osd write 32 times 
flushes fifo contents, ensuring osd memory has been updated.

1.7 Frame Store

Pixels can be written directly to the frame store, using the same 
mechanism as OSD writes. By writing pixels to the frame store and 
afterwards setting the source_select field of the trick register 
(described[sec:Trick-mode]) arbitrary bitmaps can be shown.

The only difference between an OSD write and a frame store write 
is the value of osd_frame and/or osd_comp. Tables [tab:OSD-Frame] 
and [tab:OSD-Component] list the frame and component codes. 
Frames 0 and 1 are used for storing I and P frames. Frames 2 and 
3 are used for storing B frames. All frames are stored in 4:2:0 
format, with u and v frames having half the width and height of 
the y frame. Note y, u and v values are stored in memory with an 
offset of 128. [float Table:

| osd_frame  | Frame |
|     0      |   0   |
|     1      |   1   |
|     2      |   2   |
|     3      |   3   |
|     4      |  OSD  |

[Table 1.7: 
<tab:OSD-Frame>OSD Frame

][float Table:

| osd_comp  | Component |
|    0      |     y     |
|    1      |     u     |
|    2      |     v     |

[Table 1.8: 
<tab:OSD-Component>OSD Component

Writes to the frame store are only defined when horizontal_size 
and vertical_size have meaningful values. Writes with osd_frame 4 
are only defined when osd_comp is 0.

1.8 <sec:Video-Modeline>Video Modeline

The video timing parameters are:

• horizontal_resolution

• horizontal_sync_start

• horizontal_sync_end

• horizontal_length

• vertical_resolution

• vertical_sync_start

• vertical_sync_end

• vertical_length

• horizontal_halfline

• interlaced

• pixel_repetition

These parameters can be deduced from the X11 modeline for the 
display, which is described in the “XFree86 Video Timings HOWTO”. 
Writing to the internal registers which contain the video timing 
parameters will restart the video synchronization generator.

Two video timing diagrams are shown, one for progressive video 
(Figure [fig:Progressive-Video]) and one for interlaced video 
(Figure [fig:Interlaced-Video]). The diagrams show the picture 
area (a light grey rectangle), flanked by horizontal sync (a dark 
grey vertical bar) and vertical sync (a dark grey horizontal 
bar).[float Figure:
<Graphics file: /home/user/src/xilinx/mpeg2fpga/doc/progressive.eps>

[Figure 1.2: 
<fig:Progressive-Video>Progressive Video

][float Figure:
<Graphics file: /home/user/src/xilinx/mpeg2fpga/doc/interlaced.eps>

[Figure 1.3: 
<fig:Interlaced-Video>Interlaced Video

horizontal_resolution number of dots per scan line. 

horizontal_sync_start used to specify the horizontal position the 
horizontal  sync pulse begins. The leftmost pixel of a line has 
position zero.

horizontal_sync_end used to specify the horizontal position the 
horizontal  sync pulse ends. 

horizontal_length total length, in pixels, of one scan line. 

vertical_resolution number of visible lines per frame 
(progressive) or field  (interlaced).

vertical_sync_start used to specify the line number within the 
frame (progressive) or field (interlaced) the vertical sync pulse 
begins. The topmost line of a frame or field is line number zero.

vertical_sync_end used to specify the line number within the 
frame (progressive)  or field (interlaced) the vertical sync 
pulse ends. 

horizontal_halfline used to specify the horizontal position the 
vertical  sync begins on odd fields of interlaced video. Not used 
in progressive mode. 

vertical_length total number of lines of a vertical frame 
(progressive)  or field (interlaced).

clip_display_size If asserted, the image is clipped to 
(display_horizontal_size, display_vertical_size). If not 
asserted, the image is clipped to (horizontal_size, 

interlaced used to specify interlaced output is required.  If 
interlaced is asserted, vertical sync is delayed one-half scan 
line at the end of odd fields.

pixel_repetition If pixel_repetition is asserted, each pixel is 
output twice. This can be used if the original dot clock is too 
low for the transmitter. As an example, suppose valid dot clock 
rates are 25…165 MHz, but the SDTV video being decoded has a dot 
clock of only 13.5 MHz. Asserting pixel_repetition and doubling 
dot clock frequency results in a dot clock of 27 MHz, sufficient 
for SDTV video to be transmitted across the link.

1.9 Interrupts

<sec:Interrupts>Three independent conditions may trigger an 
interrupt: when a picture header is encountered in the bitstream, 
when frame display ends, and when video resolution or frame rate 
changes. All three interrupt sources are optional and can be 
disabled individually.

When picture_hdr_intr_en is high and a picture header is 
encountered in the bitstream, picture_hdr is set and the 
interrupt signal is asserted until the status register is read. 
If picture_hdr_intr_en is low, the interrupt signal is never 
raised. picture_hdr and picture_hdr_intr_en are 0 on power-up or 
reset. The picture header interrupt marks the “heartbeat” of the 
video decoding engine.

When video vertical synchronization begins and frame_end_intr_en 
is high, frame_end is set and the interrupt signal is asserted 
until the status register is read. If frame_end_intr_en is low, 
the interrupt signal is never raised. frame_end and 
frame_end_intr_en are 0 on power-up or reset. The frame end 
interrupt marks the “heartbeat” of the video display engine. 

When one of horizontal_size, vertical_size, 
display_horizontal_size, display_vertical_size, 
progressive_sequence, aspect_ratio_information, frame_rate_code, 
frame_rate_extension_n, or frame_rate_extension_d changes, and 
video_ch_intr_en is high, video_ch is set and the interrupt 
signal is asserted until the status register is read. If 
video_ch_intr_en is low, the interrupt signal is never raised. 
video_ch and video_ch_intr_en are 0 on power-up or reset. The 
video change interrupt marks an abrupt change in the MPEG2 

It is suggested that software, when receiving a video change 

1. Reads the size, display size and frame rate registers.

2. If frame_rate_code, frame_rate_extension_d or 
  frame_rate_extension_n have changed, change dot clock 

3. Calculates a video modeline, either using a look-up table or 
  algebraically, e.g. using the VESA General Timing Formula.

4. Writes the new video modeline parameters to the horizontal, 
  horizontal sync, vertical, vertical sync and video mode 
  registers. This restarts the video synchronization.

5. If horizontal_size or vertical_size have changed and 
  osd_enable is high, rewrite the On-Screen Display.

1.10 Watchdog

The MPEG2 decoder contains a watchdog circuit. The watchdog 
circuit resets the decoder if the decoder is unresponsive. The 
decoder is considered unresponsive if the decoder does not accept 
MPEG2 data for a period of time longer than the watchdog timeout 
interval. We outline how to configure the watchdog timeout 
interval, define under which conditions the watchdog circuit 
activates, and describe what happens when the watchdog timer 

The watchdog timeout interval can be configured by writing 
watchdog_interval, register 0, bits 15-8. 

• writing 0 to watchdog_interval causes the watchdog timer to 
  expire immediately. 

• writing a value from 1 to 254, inclusive, to watchdog_interval 
  enables the watchdog circuit.

• writing 255 decimal to watchdog_interval disables the watchdog 

The default value of watchdog_interval is 127. If 
watchdog_interval has a value from 1 to 254, inclusive, the 
watchdog timeout iswatchdog\_timeout=(watchdog\_interval+1).(repeat\_frame+1).2^{18}
clk clock cycles. repeat_frame (Section [sec:Trick-mode]) 
determines the numer of times a decoded video frame is displayed. 
Each decoded video image is shown repeat_frame + 1 times. If a 
video frame is shown n times, the watchdog timeout is multiplied 
by n as well. This implies there is no need to adjust the 
watchdog timer if video is reproduced in slow motion. 

The default value of repeat_frame is 0. If decoder clk frequency 
is 75 MHz the default watchdog timeout interval is 0.45 seconds.

The watchdog timer starts running when the decoder raises the 
busy signal. If the busy signal remains high for longer than the 
watchdog timeout interval, a reset is generated. 

The watchdog timer is reset 

• when the global rst input signal is driven low

• when the decoder busy signal is low

• when the decoder has been halted to show the current frame 
  (repeat_frame is 31, freeze-frame)

• when the decoder has been halted to show a particular 
  framestore frame (source_select is non-zero)

• when the watchdog circuit has been disabled (watchdog_interval 
  has been set to 0 or to 255)

• during the first 2^{26} clk clock cycles after the watchdog 
  timer expired, or the decoder was reset. This watchdog timer 
  holdoff disables the watchdog during system initialisation. If 
  clock frequency is 75 MHz, 2^{26} clock cycles corresponds to 
  0.89 seconds.

When the watchdog timer expires

• the watchdog_rst output pin becomes low during one clk clock 
  cycle. The watchdog_rst output can be used to reset external 
  hardware, or to generate a processor interrupt.

• the watchdog_status bit in the status register is set to 1. 
  Software can detect whether the watchdog timer expired by 
  checking watchdog_status in the status register. Reading the 
  status register resets the watchdog_status bit back to 0.

• The framestore, On-Screen Display and circular video buffer are 
  filled with zeroes.

• any data in the memory response fifo is discarded. 

• osd_enable is set to 0. This disables the On-Screen Display, as 
  the On-Screen Display now contains all zeroes.

• configuration data written to the register file is not modified 
  when the watchdog expires. In particular, the video timing 
  parameters (Sec. [sec:Video-Modeline]) remain unchanged.

The watchdog_rst output pin can optionally be used to reset 
external hardware when the watchdog expires. Examples of external 
hardware are the memory controller and the DVI dot clock 
generator. Note, however, resetting memory controller and DVI dot 
clock generator when the watchdog timer expires is optional. 

The MPEG2 decoder does not require the external memory controller 
to be reset when the watchdog timer expires. When the watchdog 
timer expires, the MPEG2 decoder will write zeroes to all 
addresses from FRAME_0_Y to VBUF_END (framestore_request.v, 
STATE_CLEAR). When the watchdog timer expires, the MPEG2 decoder 
will also read and discard any data from the memory response fifo 
(framestore_response.v, STATE_FLUSH). These two actions 
re-synchronize MPEG2 decoder and external memory controller and 
bring memory to a known state.

The MPEG2 decoder also does not require the DVI clock generator 
to be reset when the watchdog expires. When the watchdog timer 
expires, the video timing parameters (Sec. [sec:Video-Modeline]) 
remain unchanged. If the DVI clock frequency remains unchanged 
when the watchdog timer expires, the decoder will continue with 
exactly the same video timing.

1.11 Trick mode

<sec:Trick-mode>The trick mode register provides a toolbox for 
implementing non-standard playback modes. An example of a 
non-standard playback mode is slow motion. It is perhaps easiest 
to visualize trick mode settings as a pipeline (Figure [fig:Trick-mode-pipeline]
).[float Figure:
<Graphics file: /home/user/src/xilinx/mpeg2fpga/doc/trick_mode.eps>

[Figure 1.4: 
<fig:Trick-mode-pipeline>Trick mode pipeline

flush_vbuf Writing one to flush_vbuf clears the incoming video 
buffer. Flushing the video buffer may be useful when changing 

persistence If persistence is set, and no new decoded image is 
available at frame start the last decoded image is shown again. 
If persistence is not set, and no new decoded image is available 
at frame start a blank screen is shown. persistence is 1 on 
power-up or reset.

source_select If zero, normal video is shown.  Non-zero values 
allow continuous output of a blank screen, or a specific frame 
from the frame store, as in table [tab:Source-Select]. 
source_select is 0 on power-up or reset. [float Table:

| source_select  |    Frame shown     |
|       0        | last decoded frame |
|       1        |    blank screen    |
|       4        |      frame 0       |
|       5        |      frame 1       |
|       6        |      frame 2       |
|       7        |      frame 3       |

[Table 1.9: 
<tab:Source-Select>Source Select

repeat_frame If zero, each decoded image is shown once.  If 
non-zero, contains the number of times the decoded image will be 
additionally shown, as in table [tab:Repeat-Frame]. A value of 31 
shows the image indefinitely. repeat_frame is 0 on power-up or 
reset. [float Table:

| repeat_frame  | times shown |
|      0        |      1      |
|      1        |      2      |
|      2        |      3      |
|      …        |             |
|      30       |     31      |
|      31       |   forever   |

[Table 1.10: 
<tab:Repeat-Frame>Repeat Frame

deinterlace Setting deinterlace high forces the decoder to output 
video as frames, even if the MPEG2 stream is interlaced. This can 
be used to reproduce interlaced MPEG2 streams on progressive 
displays. Setting deinterlace is not recommended when reproducing 
a progressive MPEG2 stream on a progressive display. Setting 
deinterlace has no effect if the video modeline specifies 
interlaced output (interlaced set). Note no spatial or temporal 
interpolation is done (“weaving”).

1.12 Test point

The MPEG2 decoder provides a test point for connecting a logic 
analyzer. Internally, the decoder contains various test points, 
only one of which is actually output to the logic analyzer. Which 
internal test point is output to the logic analyzer is determined 
by the contents of testpoint_sel. The value of bits 0..31 of the 
test point can also be read by software. While this is no 
substitute for a logic analyzer, it is recognized that in many 
cases this may be the only option available.

testpoint_sel Used in hardware debugging.  Determines which 
internal test point is multiplexed to the 34-channel logical 
analyzer test point.

testpoint Used in hardware debugging.  Provides the current value 
of bits 0 to 31 of the 34-channel logical analyzer test point.

<cha:Decoder-Sources>Decoder Sources

Chapter [cha:Decoder-Sources] provides an overview of the decoder 
sources for the hardware engineer who wishes to synthesize or 
modify the decoder.

2.1 Source Directory Structure

The source files are organized in directories as follows:

  bench/    iverilog     Icarus behavioral simulation, page [subsec:Icarus-Verilog-Simulation]  
  doc/                   Documentation                                                          
  rtl/      mpeg2        MPEG2 decoder, page [sec:MPEG2-Decoder]                                
  tools/    fsmgraph     Finite state machine graphs, page [subsec:FSM-Graphs]                  
            ieee1180     IEEE1180 IDCT accuracy test, page [subsec:IEEE-1180-IDCT]              
            logicport    Logicport logic analyzer, page [subsec:Logicport-Logic-Analyzer]       
            mpeg2dec     Reference MPEG2 decoder, page [subsec:mpeg2decode]                     
            streams      MPEG2 test streams, page [subsec:MPEG2-Test-Streams]                   

A linux system with Icarus Verilog is suggested, but not 
required, as development environment.

2.2 <sec:MPEG2-Decoder>MPEG2 Decoder

The rtl/mpeg2 directory contains the sources of the MPEG2 decoder 
itself. This section describes the changes most likely to be 
needed when instantiating the decoder: changing default modeline, 
changing FIFO sizes, choosing dual-ported ram and fifo models, 
changing memory mapping. In addition, references are provided for 
the IDCT and bilinear chroma upsampling algorithms.

2.2.1 FIFO sizes

Fifo depth and almost full/almost empty thresholds are defined in 
fifo_size.v. Note setting fifo depths and thresholds to arbitrary 
values can result in decoder deadlock.

Figure [fig:MPEG2-decoder-dataflow] shows MPEG2 decoder data 
flow. Together, framestore_request, memory controller and 
framestore_response implement the framestore. Communication with 
the framestore is through fifos. The incoming MPEG2 stream is 
written to vbuf_write_fifo. framestore_request reads the stream 
from vbuf_write_fifo and writes it to the circular video buffer 
in memory. If vbuf_read_fifo is almost empty, framestore_request 
issues memory read requests for the circular video buffer. 
framestore_response receives data from the circular video buffer 
and writes the data to vbuf_read_fifo. The net result is 
vbuf_write_fifo, circular video buffer and vbuf_read_fifo acting 
as a single, huge fifo.

Variable-length decoding reads the MPEG2 stream from 
vbuf_read_fifo, and produces motion vectors and run/length codes. 
Run/length decoding, inverse quantizing, inverse zig-zag and 
inverse discrete cosine transform (IDCT) read the run/length 
codes and produce the prediction error. The prediction error is 
written to predict_err_fifo, one row of eight pixels at a time.

Motion compensation address generation motcomp_addrgen translates 
the motion vectors into three sets of memory addresses: the 
addresses where the forward motion compensation pixels can be 
read, the addresses where the backward motion compensation pixels 
can be read, and the addresses where the reconstructed pixels can 
be written. The addresses of the pixels needed for forward and 
backward motion compensation are written to the fwd_reader and 
bwd_reader address fifos. The address of the reconstructed pixels 
is written to the motion compensation destination fifo, dst_fifo. 
The memory subsystem reads the fwd_reader and bwd_reader address 
fifos, and writes the pixel values to the fwd_reader and 
bwd_reader data fifos. 

Motion compensation reconstruction motcomp_recon adds pixel 
values read from forward motion compensation data fifo, backward 
motion compensation data fifo and prediction error, and writes 
the result to the address read from the motion compensation 
destination fifo.

Displaying the video image requires chroma resampling and yuv to 
rgb conversion. Resampling address generation resample_addrgen 
scans the reconstructed video image, line by line. The addresses 
of the pixels are written to the disp_reader address fifo. The 
memory subsystem reads the addresses from disp_reader address 
fifo and writes the pixel values to the disp_reader data fifo. 
resample_dta reads the pixel values from the disp_reader data 
fifo, while resample_bilinear does the actual bilinear chroma 
upsampling calculations. After conversion from yuv to rgb, the 
pixels are written to the pixel queue pixel_queue which adapts 
between decoder and DVI clocks. [float Figure:
<Graphics file: /home/user/src/xilinx/mpeg2fpga/doc/>

[Figure 2.1: 
<fig:MPEG2-decoder-dataflow>MPEG2 decoder dataflow

Note the memory tag fifo mem_tag_fifo between framestore_request 
and framestore_response. For every memory read request, 
framestore_request writes a tag to the memory tag fifo. The tag 
identifies the source of the memory read request: circular video 
buffer, forward and backward motion compensation, or resampling. 
For every data word received from memory, framestore_response 
reads a tag from the memory tag fifo, and writes the data word 
received from memory to the data fifo corresponding to the tag. 
If the memory tag fifo is almost full, framestore_request stops 
issuing memory read or write requests. As a result, the number of 
outstanding memory read requests is always less than or equal to 
the size of the memory tag fifo.

When modifying fifo_size.v, care should be taken the fifos can 
never overflow. Note that when framestore_request stops issuing 
memory read requests, there still may be outstanding memory read 
requests in the memory request queue. The number of outstanding 
memory read requests is always smaller than, or equal to, the 
size of the memory tag fifo. When modifying fifo_size.v, remember 
fifos which receive data from memory may receive outstanding 
data, even after framestore_request has stopped sending memory 
read requests.

2.2.2 Dual-ported memory and FIFO models

FPGAs typically provide dedicated on-chip fifo's and dual-port 
RAMs. The designer then has to choose between using 
vendor-provided FIFOs and dual-port RAMs or writing his own.

The file wrappers.v defines the implementation of all dual-port 
RAMs and fifos in the design. For each component, two versions 
are provided: one where read and write port share a common clock; 
and one where read and write port have independent clocks. 

dpram_sc dual-ported ram, same clock for read and write ports

dpram_dc dual-ported ram, different clock for read and write 

fifo_sc fifo, same clock for read and write ports

fifo_dc fifo, different clock for read and write ports

The dual-ported rams are inferred from code in wrappers.v. The 
fifos can be either implemented in Verilog, or instantiated as 
FPGA primitives, depending upon wrappers.v. Following fifo models 
are available:

xfifo_sc.v fifo, same clock for read and write port.

generic_fifo_sc_b.v OpenCores generic fifo, different clock for 
read and write ports.

xilinx_fifo_sc.v Xilinx Virtex-5 fifo, same clock for read and 
write ports. Uses xilinx_fifo.v, xilinx_fifo144.v and 

xilinx_fifo_dc.v Xilinx Virtex-5 fifo, different clock for read 
and write ports. Uses xilinx_fifo.v, xilinx_fifo144.v and 

xilinx_fifo_sc.v and xilinx_fifo_dc.v implement fifos using 
FIFO18, FIFO18_36, FIFO36 or FIFO36_72 Virtex-5 primitives. Table 
[tab:Xilinx-FIFO-address] lists available data and address 
widths. If a xilinx_fifo_sc.v or a xilinx_fifo_dc.v is 
instantiated with data and/or address widths different from those 
in Table [tab:Xilinx-FIFO-address], the actual fifo will be 
larger and/or wider. [float Table:

| Data bits  | Address bits  | FIFO Depth  | Implementation |
|     4      |      13       |    8192     |     FIFO36     |
|     4      |      12       |    4096     |     FIFO18     |
|     9      |      12       |    4096     |     FIFO36     |
|     9      |      11       |    2048     |     FIFO18     |
|    18      |      11       |    2048     |     FIFO36     |
|    18      |      10       |    1024     |     FIFO18     |
|    36      |      10       |    1024     |     FIFO36     |
|    36      |      9        |    512      |     FIFO18     |
|    72      |      9        |    512      |   FIFO36_72    |
|    144     |      9        |    512      | 2 * FIFO36_72  |
|    216     |      9        |    512      | 3 * FIFO36_72  |

[Table 2.1: 
<tab:Xilinx-FIFO-address>Xilinx FIFO address widths

2.2.3 Memory mapping

The MPEG2 decoder memory mapping is defined in 
rtl/mpeg2/mem_codes.v. The default memory mapping needs 4 mbyte 
RAM and is sufficient for SDTV. By defining MP_AT_HL an 
alternative memory mapping can be chosen which requires 16 mbyte 
RAM and is sufficient for HDTV.

Translation of macroblock addresses to memory addresses is 
implemented in rtl/mpeg2/mem_addr.v. A macroblock address, a 
signed motion vector (mv_x, mv_y) with halfpixel precision, and 
an signed offset (delta_x, delta_y) with pixel precision are 
translated to an address in memory.

The macroblock address is assumed to iterate over all allowable 
values: beginning at zero, incrementing by one, until after the 
final macroblock the macroblock address is reset to zero. 
Macroblock address has to be initialized to zero, or an error 
condition results. Macroblock address changes other than 
incrementing by one, remaining unchanged or resetting to zero 
also result in an error condition.

Note the motion vector (mv_x, mv_y) is scaled by a factor two 
when accessing chrominance as defined in [1, par.]. The 
offset (delta_x, delta_y) remains unchanged when accessing 
chrominance blocks.

The translation of macroblock addresses and motion vectors to 
memory addresses in rtl/mpeg2/mem_addr.v has to be kept 
synchronized with the framestore dump task write_framestore in 
rtl/sim/mem_ctl.v, else the framestore dumps made during 
simulation will not accurately represent framestore contents.

Note out-of-range memory accesses are translated to the ADDR_ERR 
address. If a memory request with address mem_req_rd_addr equal 
to ADDR_ERR occurs during simulation, simulation stops with an 
error message.

The MPEG2 decoder zeroes out the framestore after system reset or 
when the watchdog timer expires. The MPEG2 decoder writes zeroes 
to all addresses from FRAME_0_Y to VBUF_END when the rst input 
pin goes low or when the watchdog_rst pin goes low. 

2.2.4 Modeline

The default modeline is 800x600 progressive @ 60 Hz (SVGA). The 
modeline.v source contains the modeline parameters, and can be 
edited to change horizontal and vertical resolution, sync pulse 
width and position. The default pixel frequency on the ML505 is 
38.21 MHz, and is defined in dotclock_synthesizer.v. Note 
dotclock_synthesizer.v synthesizes two frequencies, dotclock and 
dotclock90, equal in frequency but 90 degrees phase shifted. The 
frequency synthesized is f_{out}=f_{osc}.r.\frac{DCM\_ADV\_INST.CLKFX\_MULTIPLY}{DCM\_ADV\_INST.CLKFX\_DIVIDE}

where f_{osc} is the 100 MHz user clock frequency f_{osc}=100and r=\frac{PLL\_ADV\_INST.CLKFBOUT\_MULT}{PLL\_ADV\_INST.CLKOUT1\_DIVIDE}=0.25
 To change pixel frequency, first calculate the multiplier and 
divider for the new frequency. Suppose one wishes to synthesize a 
frequency of 35 MHz:

macpro mpeg2ether # ./mpeg2ether --dot_clock 35

dotclock ftarget =  35.00 fout =  35.00 MHz
 multiplier:  7 divider:  5
 high frequency mode: 0 ch7301 lowfreq: 1 ch7301 colorbars: 0

A pixel frequency of 35 MHz requires a multiplier of 7 and a 
divider of 5, with lowfreq asserted. Hence, in dvi/dotclock.v:

parameter [7:0]

  DEFAULT_DIVIDER       = 8'd4, // Divider minus one, actually

  DEFAULT_MULTIPLIER    = 8'd6; // Multiplier minus one, actually


  DEFAULT_LOWFREQ       = 1'b1

Note the modeline can be configured at any time using the 
mpeg2ether utility; it is only when changing the default modeline 
that modifying the sources is necessary. The mpeg2ether utility 
is explained on page [subsec:mpeg2ether].

2.2.5 Inverse Discrete Cosine Transform

The IDCT algorithm used is described in [4]. A copy of document [4]
 can be found in the doc directory. The IDCT implementation uses 
12 18x18 multipliers and two dual-port rams, and can do 
streaming. Run-length decoding (rld.v), inverse quantizing 
(iquant.v, zigzag_table.v) and IDCT transform (idct.v) all 
operate at the same speed of one pixel per clock. The IDCT meets 
the requirements of the former IEEE-1180.

2.2.6 Bilinear chroma upsampling

The chrominance components have half the vertical and half the 
horizontal resolution of the luminance. To obtain equal 
chrominance and luminance resolution, bilinear chroma upsampling 
is used. Bilinear chroma upsampling computes chroma pixel values 
by vertical and horizontal interpolation. Vertical interpolation 
implies adding two rows of chroma values with different weights. 
The chroma row closest to the luma row gets weight 3/4, while the 
chroma row farthest from the luma row gets weight 1/4. The 
document doc/bilinear.pdf shows the weights used.

Bilinear chroma upsampling is implemented in various source 
files, as described in Table [tab:Upsampling-source-files]. [float Table:

| Source               | Description                                    |
| resample.v           | Upsampling top-level file                      |
| resample_addrgen.v   | Generates memory addresses of chroma/lumi rows |
| resample_dta.v       | Reads chroma/lumi rows from memory             |
| resample_bilinear.v  | Performs bilinear upsampling calculations      |

[Table 2.2: 
<tab:Upsampling-source-files>Upsampling source files


2.3 Simulation

Behavioral simulation using Icarus Verilog is described. For 
timing simulation consult your synthesis software.

2.3.1 <subsec:Icarus-Verilog-Simulation>Icarus Verilog Simulation

Behavioral simulation of the decoder can be performed using 
Icarus Verilog. The Icarus Verilog testbench in the 
bench/iverilog directory contains the following files:

testbench.v Top-level Verilog source; instantiates MPEG2 decoder.

mem_ctl.v Simple memory controller, for simulation only.

Makefile Makefile to create and run the simulation.

wrappers.v Wrapper for dual-port ram and fifos. Implements 
synchronous fifos using xfifo_sc.v, and implements asynchronous 
fifos as OpenCores generic_fifo_sc_b.v. 

generic_dpram.v, generic_fifo_dc.v, generic_fifo_sc_b.v Opencores 
generic fifos.

Create the decoder is easy using the accompanying Makefile. 
First, remove any files left over from a previous simulation:

koen@macpro ~/xilinx/mpeg2/bench/iverilog $ make clean

rm -f mpeg2 stream.dat testbench.lxt trace framestore_*.ppm 

Now create the decoder:

koen@macpro ~/xilinx/mpeg2/bench/iverilog $ make

iverilog -D__IVERILOG__ -DMODELINE_SIF -I ../../rtl/mpeg2 -o 
 testbench.v mem_ctl.v wrappers.v generic_fifo_dc.v
 generic_fifo_sc_b.v generic_dpram.v ../../rtl/mpeg2/mpeg2video.v 

../../rtl/mpeg2/vbuf.v ../../rtl/mpeg2/getbits.v 
xxd -c 1 ../../tools/streams/stream-susi.mpg |
 cut -d\  -f 2 > stream.dat

This executes two commands:

• iverilog to compile the Verilog sources to an executable, 

• xxd to convert the binary MPEG2 program stream file stream.mpg 
  to an ASCII file stream.dat, which the simulator can load.

When compiling the Verilog sources, two Verilog parameters are 
defined on the command line: __IVERILOG__ and MODELINE_SIF. The 
first Verilog define, __IVERILOG__ , is defined only during 
simulation, and never during synthesis. It is used to enable 
several run-time checks which only make sense in a simulation 
environment. The second Verilog define, MODELINE_SIF, chooses one 
of several pre-defined video output formats from modeline.v.

Finally, run the newly created executable mpeg2:

koen@macpro ~/xilinx/mpeg2/bench/iverilog $ make test


LXT info: dumpfile testbench.lxt opened for output.

$readmemh(stream.dat): Not enough words in the read file for
 requested range.

testbench.mem_ctl.write_framestore      dumping framestore to
               framestore_000.ppm @  0.02 ms

testbench.mem_ctl.write_framestore      dumping framestore to
               framestore_001.ppm @  0.02 ms

testbench.mpeg2.motcomp macroblock_address:    0

testbench.mpeg2.motcomp macroblock_address:    1

testbench.mpeg2.motcomp macroblock_address:    2

testbench.mpeg2.motcomp macroblock_address:    3

During simulation, the environment variable IVERILOG_DUMPER=lxt 
is set. This instructs the simulator to produce a dumpfile in the 
more compact lxt format, instead of the default vcd format.

By default, simulator output includes the macroblock address. 
This allows easy monitoring of decoder progress.

Each Verilog source file contains a define DEBUG statement, which 
can be uncommented or commented to switch trace output for that 
particular source file on or off. 

During simulation, two kinds of graphics files are written: 
framestore dumps framestore_*.ppm and video captures 
tv_out_*.ppm. The framestore is where the decoder stores already 
decoded images. These are Portable Pixmap graphics files in ASCII 
format. Figure [fig:susi-framestore-dump] shows a sample 
framestore dump.

The framestore consists of four frames and the on-screen display 
(OSD). The first two frames contain I and P pictures, while the 
last two frames contain B-pictures. Each frame consists of y 
(luminance), u and v (chrominance) information, with u and v 
having half the horizontal and half the vertical resolution of y. 
In the framestore dump, uninitialized memory is displayed in 
green. Looking at figure [fig:susi-framestore-dump], one can see 
that the first three frames of the framestore have already been 
written; the decoder is halfway through the fourth frame. The 
On-Screen Display, at the bottom of the framestore dump, has not 
been initialized yet.

During simulation, by default, the framestore is dumped whenever 
a new frame begins; and every 200 macroblocks. As a framestore 
dump is a graphics file in ASCII format, one can also look at the 
file using standard text file utilities. These are the first 12 
lines of a sample framestore dump:

koen@macpro ~/xilinx/mpeg2/bench/iverilog $ head -12 


# mpeg2 framestore dump @ 11.81 ms

# frame number 2

# horizontal_size 352

# vertical_size 288

# display_horizontal_size 0

# display_vertical_size 0

# mb_width 22

# mb_height 18

# picture_structure frame picture

# chroma_format 420

352 2618 255

255 255 255 255 255 255 255 255 255 255 255 255
 255 255 255 255 255 255 255 255 255 255 255 255

The header of the framestore dump contains information about 
decoder status at the moment of the dump.[float Figure:
<Graphics file: /home/user/src/xilinx/mpeg2fpga/doc/>

[Figure 2.2: 
<fig:susi-framestore-dump>Framestore dump

Figure [fig:susi-video-output] shows video capture file 
tv_out_0000.ppm.Horizontal sync is displayed as a vertical black 
stripe, to the right of the image. Vertical sync is displayed as 
a horizontal black stripe, below the image area. Blanking is 
displayed in a dark grey. The position of picture, horizontal 
sync and vertical sync in figure [fig:susi-video-output] is as 
defined in figure [fig:Progressive-Video]. As with the framestore 
dumps, one can look at tv_out_0000.ppm using standard text 

koen@macpro ~/xilinx/mpeg2/bench/iverilog $ head -10 


# picture 1 @ 10.73 ms

# horizontal resolution 352 sync_start 381 sync_end 388 length 

# vertical resolution 288 sync_start 295 sync_end 298 length 315

# interlaced 0 halfline 175

459 316 255

0 0 0

0 77 0

3 0 3

2 0 2

The header of the video capture file contains information about 
the video modeline at the moment of video capture.[float Figure:
<Graphics file: /home/user/src/xilinx/mpeg2fpga/doc/>

[Figure 2.3: 
<fig:susi-video-output>Video output capture

To end the simulation, go to the window where iverilog is running 
and type ctrl-c finish. The simulator will finish writing trace 
and testbench.lxt files, and return control to the command 

The binary file testbench.lxt is a log of all wire and register 
changes which occurred during simulation. testbench.lxt can be 
displayed using vcd viewers such as gtkwave. 

koen@macpro ~/xilinx/mpeg2/bench/iverilog $ gtkwave testbench.lxt 

Once testbench.lxt file has been loaded in gtkwave, internal 
decoder wires and registers can be displayed as waveforms.

2.3.2 <subsec:Conformance-Tests>Conformance Tests

The bench/conformance directory contains a testbench for the 
ISO/IEC 13818-4 MPEG2 conformance tests. The testbench assumes 
the ISO/IEC 13818-4 conformance test bitstreams are available on 
your system. The ISO/IEC 13818-4 MPEG2 Conformance test 
bitstreams for Main Profile @ Main Level can be downloaded from 
the ISO web site using the tools/streams/retrieve script.

Typing make clean test in the bench/conformance directory 
simulates all MP@ML conformance test bitstreams. Table [tab:Conformance-Test-Suite]
 summarizes test results. 

When running the compatibility tests, note the decoder is not 
MPEG1-compatible, and does not decode MPEG1 streams. The MPEG2 
decoder decodes MPEG2 4:2:0 program streams only. [float Table:

| Test bitstream             | Profile and level  |       Remarks       |
| tcela/tcela-16-matrices    |      11172-2       | Fail (MPEG1 stream) |
| tcela/tcela-18-d-pict      |      11172-2       | Fail (MPEG1 stream) |
| compcore/ccm1              |      11172-2       | Fail (MPEG1 stream) |
| tcela/tcela-19-wide        |      11172-2       | Fail (MPEG1 stream) |
| toshiba/toshiba_DPall-0    |       SP@ML        |                     |
| nokia/nokia6_dual          |       SP@ML        |                     |
| nokia/nokia6_dual60        |       SP@ML        |                     |
| nokia/nokia_7              |       SP@ML        |                     |
| tcela/tcela-14-bff-dp      |       SP@ML        |                     |
| ibm/ibm-bw-v3              |       SP@ML        |                     |
| tcela/tcela-8-fp-dp        |       SP@ML        |                     |
| tcela/tcela-9-fp-dp        |       SP@ML        |      1 bit off      |
| mei/MEI.stream16v2         |       SP@ML        | Fail (MPEG1 stream) |
| mei/MEI.stream16.long      |       SP@ML        | Fail (MPEG1 stream) |
| ntr/ntr_skipped_v3         |       SP@ML        |                     |
| teracom/teracom_vlc4       |       SP@ML        |                     |
| tcela/tcela-15-stuffing    |       SP@ML        |                     |
| tcela/tcela-17-dots        |       SP@ML        |                     |
| gi/gi4                     |       MP@ML        |                     |
| gi/gi6                     |       MP@ML        |                     |
| gi/gi_from_tape            |       MP@ML        |                     |
| gi/gi7                     |       MP@ML        |                     |
| gi/gi_9                    |       MP@ML        |                     |
| ti/TI_cl_2                 |       MP@ML        |                     |
| tceh/tceh_conf2            |       MP@ML        |                     |
| mei/mei.2conftest.4f       |       MP@ML        |                     |
| mei/  |       MP@ML        |                     |
| tek/Tek-5.2                |       MP@ML        |                     |
| tek/Tek-5-long             |       MP@ML        |                     |
| tcela/tcela-6-slices       |       MP@ML        |                     |
| tcela/tcela-7-slices       |       MP@ML        |                     |
| sony/sony-ct1              |       MP@ML        |                     |
| sony/sony-ct2              |       MP@ML        |                     |
| sony/sony-ct3              |       MP@ML        |                     |
| sony/sony-ct4              |       MP@ML        |                     |
| att/att_mismatch           |       MP@ML        |                     |
| teracom/teracom_vlc4       |       MP@ML        |                     |
| ccett/mcp10ccett           |       MP@ML        |                     |
| lep/bits_conf_lep_11       |       MP@ML        |                     |
| hhi/hhi_burst_short        |       MP@ML        |                     |
| hhi/hhi_burst_long         |       MP@ML        |                     |
| tcela/tcela-10-killer      |       MP@ML        |                     |

[Table 2.3: 
<tab:Conformance-Test-Suite>Conformance Test Suite

2.4 <sec:Tools>Tools

The tools directory contains various utilities and tools used 
during decoder development and test.

2.4.1 <subsec:Logicport-Logic-Analyzer>Logic Analyzer

On the Xilinx ML505, the MPEG2 decoder testpoint has been broken 
out to the Xilinx Generic Interface (XGI) . The test point 
selection can be done using the GPIO DIP switches. If the ML505 
is held so the LCD can be read, the GPIO DIP switches are at the 
bottom right of the board. GPIO DIP switches are numbered 1 to 8, 
from left to right.

If GPIO DIP switch 3 is off, test point selection is made by 
writing to register 15 decimal, REG_WR_TESTPOINT. If GPIO DIP 
switch 3 is on, test point selection is made by dip switches 5 to 
8. GPIO DIP switch 5 is MSB, GPIO DIP switch 8 is LSB.

Verify the probing has been enabled in probe.v. Note that, as one 
adds test points, routing and timing closure becomes more and 
more difficult. Only define those test points you need.

The Intronix Logicport is a small USB-based logic analyzer. It 
has 34 channels, two of which can be used as clock inputs, and 
does state analysis at up to 200 MHz. The MPEG2 decoder on the 
ML505 runs at 75 MHz, with a typical dot clock of 27 MHz, well 
within the capabilities of the Logicport logic analyzer. Probing 
the memory controller at 200 MHz, however, is borderline. To be 
on the safe side, when probing the memory controller with the 
Logicport, lower memory clock to 125 MHz .

A small two-layer adapter board has been designed to connect the 
Intronix Logicport to the Xilinx ML505. Board layout can be 
downloaded from

The tools/logicport directory contains Logicport configuration 
files for the test points defined in probe.v. Note configuration 
files can be read and waveforms displayed by Logicport software 
even if no analyzer is present.

2.4.2 <subsec:FSM-Graphs>Finite State Machine Graphs

The MPEG2 decoder uses Finite State Machines throughout; no 
embedded processors or microcontrollers are used. Verifying the 
correctness of the Finite State Machines is important. Finite 
state machine transition graphs are created from Verilog source 
files as a means of visually inspecting and verifying source 
correctness. The mkfsmgraph Perl script in tools/fsmgraph assumes 
the comment /* next state logic */ marks the beginning of a case 
statement in an always block, used to select the next state, and 
that all states begin with STATE_ :

/* next state logic */

always @*

  case (state)

    STATE_INIT: if (first_pixel_read) next = STATE_WAIT;

                else next = STATE_INIT;


    default next = STATE_INIT


/* state */

always @(posedge clk)

  if(~rst) state <= STATE_INIT;

  else state <= next;

The mkfsmgraph tool parses the Verilog source files using the 
following algorithm:

• read the Verilog file until the comment /* next state logic */ 
  is found

• take the first always block after the /* next state logic */ 

• any word beginning with STATE_ is assumed to represent a FSM 

• if the character following the FSM state is a colon (:) the 
  state is a graph node.

• if the character following the FSM state is a semicolon (;) the 
  state is the end point of a state transition.

• if the character following the FSM state is neither a colon (:) 
  nor a semicolon (;) the state is not added to the graph.

The resulting graph is written to standard output in gml format. 
Graph layout software uDrawGraph from the University of Bremen, 
Germany, is then used to produce a visually appealing graph. 

No attempt has been made to write a script capable of parsing 
arbitrary Verilog sources. The Verilog sources have been written 
so the script can parse them.

The graph of the variable length-decoding FSM vld.v has been 
simplified further by removing all transitions to 
STATE_NEXT_START_CODE and STATE_ERROR. Nodes which transition to 
STATE_NEXT_START_CODE are drawn with double border. Removing 
transitions to STATE_NEXT_START_CODE and STATE_ERROR produces a 
graph with much less visual clutter. A large format version of 
the FMS graph of vld.v  can be found in doc/vld-poster.pdf. It is 
suggested to become familiar with the graph before significantly 
modifying vld.v. 

2.4.3 <subsec:IEEE-1180-IDCT>IEEE-1180 IDCT Accuracy Test

idct.v has been tested to comply with the former IEEE-1180, the 
actual ISO/IEC 23002-1 [2]. The testbench can be found in the 
tools/ieee1180 directory. Test results can be found in the file 
ieee-1180-results. Test results indicate the idct implementation 
is IEEE-1180 compliant.

2.4.4 <subsec:mpeg2decode>Reference software decoder

The directory tools/mpeg2dec contains the MPEG2 reference 
decoder, modified to provide extensive logging and to regularly 
write the framebuffers to file. A sample run could be:

koen@macpro ~/xilinx/mpeg2/tools $ mkdir run

koen@macpro ~/xilinx/mpeg2/tools $ cd run

koen@macpro ~/xilinx/mpeg2/tools/run $ ../mpeg2dec/mpeg2decode
 -r -v9 -t -o0 'dump_%d_out_%c' -b ../streams/tcela-17.mpg > log

saving dump_0_out_f.y.ppm

saving dump_0_out_f.u.ppm

saving dump_0_out_f.v.ppm

saving dump_0_forward_ref_frm.y.ppm

saving dump_0_forward_ref_frm.u.ppm

saving dump_0_forward_ref_frm.v.ppm

saving dump_0_backward_ref_frm.y.ppm

saving dump_0_backward_ref_frm.u.ppm

saving dump_0_backward_ref_frm.v.ppm

saving dump_0_auxframe.y.ppm

saving dump_0_auxframe.u.ppm

saving dump_0_auxframe.v.ppm

saving dump_1_out_f.y.ppm

saving dump_1_out_f.u.ppm


The log file contains detailed information about the execution of 
the MPEG2 decoding algorithm, while the .ppm files contain 
framestore dumps, using separate graphics files for each y, u and 
v component. 

2.4.5 <subsec:MPEG2-Test-Streams>MPEG2 Test Streams

The tools/streams directory contains some sample MPEG2 program 
streams, useful during testing. The retrieve script in the 
tools/streams directory can be used to download the ISO/IEC 
13818-4 conformance test bitstreams from the ISO web site[footnote:
ISO/IEC 13818-4 test bitstreams,


[1] ITU-T Recommendation H.262 “Information technology - Generic 
coding of moving pictures and associated audio information: Video”
, 2000. Also published as ISO/IEC International Standard 13818-2.

[2] ISO/IEC International Standard 23002-1 “Information 
technology - MPEG video technologies - Part 1: Accuracy 
requirements for implementation of integer-output 8x8 inverse 
discrete cosine transform”, 2006.

[3] “Architecture and Bus-Arbitration Schemes for MPEG-2 Video 
Decoder", Jui-Hua Li and Nam Ling, IEEE Transactions on Circuits 
and Systems for Video Technology, Vol. 9, No. 5, August 1999, 

[4] “Systematic approach of Fixed Point 8x8 IDCT and DCT Design 
and Implementation", Ci-Xun Zhang , Jing Wang , Lu Yu, Institute 
of Information and Communication Engineering, Zhejiang 
University, Hangzhou, China, 310027.

[5] “Virtex-5 FPGA User Guide”, Xilinx UG190 (v3.2), December 11, 

[6] “ML505/506 MIG Design Creation Using ISE 9.2i SP3, MIG 2.0 
and ChipScope Pro 9.2i”, Xilinx, December 2007.

Compare with Previous | Blame | View Log

powered by: WebSVN 2.1.0

© copyright 1999-2021, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.