OpenCores
URL https://opencores.org/ocsvn/warp/warp/trunk

Subversion Repositories warp

[/] [warp/] [doc/] [tmu.tex] - Blame information for rev 7

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 7 lekernel
\documentclass[a4paper,11pt]{article}
2
\usepackage{fullpage}
3
\usepackage[latin1]{inputenc}
4
\usepackage[T1]{fontenc}
5
\usepackage[normalem]{ulem}
6
\usepackage[english]{babel}
7
\usepackage{listings,babel}
8
\lstset{breaklines=true,basicstyle=\ttfamily}
9
\usepackage{graphicx}
10
\usepackage{moreverb}
11
\usepackage{url}
12
\usepackage{amsmath}
13
\usepackage{float}
14
 
15
\title{Texture Mapping Unit}
16
\author{S\'ebastien Bourdeauducq}
17
\date{\today}
18
\begin{document}
19
\maketitle{}
20
 
21
\section{Presentation}
22
Milkymist has hardware acceleration for texture mapping on triangle strips. This process is used to implement the image warping effect in MilkDrop.
23
 
24
The texture mapping unit also supports blending, which can for instance be used to implement the fade-to-black feature (the \verb!decay! variable in presets) of MilkDrop.
25
 
26
The core deals with 16-bit RGB565 progressive-scan framebuffers, accessed via FML links with a width of 64 bits and a burst length of 4.
27
 
28
The vertex data is fetched using a 32-bit WISHBONE master. Connecting this bus to the WISHBONE-to-FML caching bridge allows the mesh data to be stored in cost-effective DRAM.
29
 
30
For controlling the core, a CSR bus slave is also implemented.
31
 
32
\section{Configuration and Status Registers}
33
Registers can be read at any time, and written when the core is not busy. Write operations when the busy bit is set in register 0, including those to the control register, are illegal and can cause unpredictable behaviour.
34
 
35
Addresses are in bytes to match the addresses seen by the CPU when the CSR bus is bridged to Wishbone.
36
 
37
\subsection{Parameters and control}
38
\begin{tabular}{|l|l|l|p{10.5cm}|}
39
\hline
40
\bf{Offset} & \bf{Access} & \bf{Default} & \bf{Description} \\
41
\hline
42
0x00 & RW & 0 & Control register. Bit 0 = busy/start. Bit 1 = IRQ status (cleared whenever the register is written). \\
43
\hline
44
0x04 & RW & 0 & Address of the mesh data. Must be aligned on a 32-bit boundary. \\
45
\hline
46
0x08 & RW & 32 & Number of mesh areas in the X direction (which is the number of mesh points minus one). \\
47
\hline
48
0x0C & RW & 20 & Size of the mesh in the X direction. This is typically the horizontal resolution divided by the number of mesh points. \\
49
\hline
50
0x10 & RW & 24 & Number of mesh areas in the Y direction. \\
51
\hline
52
0x14 & RW & 20 & Size of the mesh in the Y direction. This is typically the vertical resolution divided by the number of mesh points. \\
53
\hline
54
0x18 & RW & 0 & Source framebuffer address. Must be aligned on a 16-bit boundary. \\
55
\hline
56
0x1C & RW & 640 & Source horizontal resolution. \\
57
\hline
58
0x20 & RW & 480 & Source vertical resolution. \\
59
\hline
60
0x24 & RW & 0 & Destination framebuffer address. Must be aligned on a 16-bit boundary. \\
61
\hline
62
0x28 & RW & 640 & Destination horizontal resolution. \\
63
\hline
64
0x2C & RW & 480 & Destination vertical resolution. \\
65
\hline
66
0x30 & RW & 0 & Horizontal offset (a number substracted to each destination X coordinate). \\
67
\hline
68
0x34 & RW & 0 & Vertical offset (a number substracted to each destination Y coordinate). \\
69
\hline
70
0x38 & RW & 63 & Brightness, between 0 and 63. The components of each pixel are multiplied by $ (n+1) \over 64 $ and rounded to the lowest integer. That means that a value of 0 in this register makes the destination picture completely black (because of the limited resolution of RGB565). \\
71
\hline
72
\end{tabular}
73
 
74
\subsection{Performance counters}
75
In order to help tracking down the ``low FPS'' symptom, the core is equipped with integrated performance counters.
76
 
77
Those counters are automatically reset when a new frame is submitted for processing, and must be read after the frame processing is finished.
78
 
79
These registers are read-only. Attempting to write them results in undefined behaviour.
80
 
81
\begin{tabular}{|l|l|l|p{10.5cm}|}
82
\hline
83
\bf{Offset} & \bf{Access} & \bf{Default} & \bf{Description} \\
84
\hline
85
0x40 & R & 0 & Total number of drawn pixels. Off-screen pixels are not counted. \\
86
\hline
87
0x44 & R & 0 & Total number of used clock cycles. \\
88
\hline
89
0x48 & R & 0 & Total number of stalled transactions detected at pipeline monitor 1. \\
90
\hline
91
0x4C & R & 0 & Total number of completed transactions detected at pipeline monitor 1. \\
92
\hline
93
0x50 & R & 0 & Total number of stalled transactions detected at pipeline monitor 2. \\
94
\hline
95
0x54 & R & 0 & Total number of completed transactions detected at pipeline monitor 2. \\
96
\hline
97
0x58 & R & 0 & Total number of misses in the input image cache. \\
98
\hline
99
\end{tabular}
100
 
101
\section{Encoding the vertex data}
102
The core supports a maximum mesh of 128x128 points. The address of the point at indices $ (x, y) $ in the mesh is, regardless of the actual the number of mesh points :
103
 
104
\begin{equation*}
105
base + 4 \cdot (128 \cdot y + x)
106
\end{equation*}
107
 
108
This means that the mesh always has the same size in memory.
109
 
110
Each point is made up of 32 bits, with the 16 upper bits being the destination Y coordinates and the 16 lower bits the X coordinate.
111
 
112
Exactly 64KB are used by the mesh.
113
 
114
\section{Architecture}
115
 
116
\begin{figure}[H]
117
\centering
118
\includegraphics[height=180mm]{architecture.eps}
119
\caption{Texture mapping unit architecture.}\label{fig:architecture}
120
\end{figure}
121
 
122
\subsection{Handshake protocol between pipeline stages}
123
Because pipeline stages are not always ready to accept and/or to produce data (because, for example, of memory latencies), a flow control protocol must be implemented.
124
 
125
The situation is the same between all stages : an upstream stage is registering data into a downstream stage. During some cycles, the upstream stage cannot produce valid data and/or the downstream stage is processing the previous data and has no memory left to store the incoming data.
126
 
127
\begin{figure}[H]
128
\centering
129
\includegraphics[height=30mm]{comm.eps}
130
\caption{Communication between two pipeline stages.}\label{fig:comm}
131
\end{figure}
132
 
133
Appropriate handling of these cases is done using standardized \verb!stb! and \verb!ack! signals. The meaning of these is summarized in this table :\\
134
 
135
\begin{tabular}{|l|l|p{12cm}|}
136
\hline
137
\verb!stb! & \bf \verb!ack! & \bf Situation \\
138
\hline
139
 
140
\hline
141
 
142
\hline
143
1 & 0 & The upstream stage is trying to send data to the downstream stage, which is currently not ready to accept it. The transaction is \textit{stalled}. The upstream stage must keep \verb!stb! asserted and continue to present valid data until the transaction is completed. \\
144
\hline
145
1 & 1 & The upstream stage is sending data to the downstream stage which is ready to accept it. The transaction is \textit{completed}. The downstream stage must register the incoming data, as the upstream stage is not required to hold it valid at the next cycle. \\
146
\hline
147
\end{tabular}\\
148
 
149
It is not allowed to generate \verb!ack! combinatorially from \verb!stb!. The \verb!ack! signal must always represent the current state of the downstream stage, ie. whether or not it will accept whatever data we present to it.
150
 
151
\subsection{Triangle filling}
152
The triangle filling algorithm is inspired by \url{http://www.geocities.com/wronski12/3d_tutor/tri_fillers.html}. To make the linear interpolations, a variant of the Bresenham algorithm is used.
153
 
154
\end{document}

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.