URL
https://opencores.org/ocsvn/warp/warp/trunk
Subversion Repositories warp
[/] [warp/] [doc/] [tmu.tex] - Rev 7
Compare with Previous | Blame | View Log
\documentclass[a4paper,11pt]{article} \usepackage{fullpage} \usepackage[latin1]{inputenc} \usepackage[T1]{fontenc} \usepackage[normalem]{ulem} \usepackage[english]{babel} \usepackage{listings,babel} \lstset{breaklines=true,basicstyle=\ttfamily} \usepackage{graphicx} \usepackage{moreverb} \usepackage{url} \usepackage{amsmath} \usepackage{float} \title{Texture Mapping Unit} \author{S\'ebastien Bourdeauducq} \date{\today} \begin{document} \maketitle{} \section{Presentation} Milkymist has hardware acceleration for texture mapping on triangle strips. This process is used to implement the image warping effect in MilkDrop. The texture mapping unit also supports blending, which can for instance be used to implement the fade-to-black feature (the \verb!decay! variable in presets) of MilkDrop. The core deals with 16-bit RGB565 progressive-scan framebuffers, accessed via FML links with a width of 64 bits and a burst length of 4. The vertex data is fetched using a 32-bit WISHBONE master. Connecting this bus to the WISHBONE-to-FML caching bridge allows the mesh data to be stored in cost-effective DRAM. For controlling the core, a CSR bus slave is also implemented. \section{Configuration and Status Registers} Registers can be read at any time, and written when the core is not busy. Write operations when the busy bit is set in register 0, including those to the control register, are illegal and can cause unpredictable behaviour. Addresses are in bytes to match the addresses seen by the CPU when the CSR bus is bridged to Wishbone. \subsection{Parameters and control} \begin{tabular}{|l|l|l|p{10.5cm}|} \hline \bf{Offset} & \bf{Access} & \bf{Default} & \bf{Description} \\ \hline 0x00 & RW & 0 & Control register. Bit 0 = busy/start. Bit 1 = IRQ status (cleared whenever the register is written). \\ \hline 0x04 & RW & 0 & Address of the mesh data. Must be aligned on a 32-bit boundary. \\ \hline 0x08 & RW & 32 & Number of mesh areas in the X direction (which is the number of mesh points minus one). \\ \hline 0x0C & RW & 20 & Size of the mesh in the X direction. This is typically the horizontal resolution divided by the number of mesh points. \\ \hline 0x10 & RW & 24 & Number of mesh areas in the Y direction. \\ \hline 0x14 & RW & 20 & Size of the mesh in the Y direction. This is typically the vertical resolution divided by the number of mesh points. \\ \hline 0x18 & RW & 0 & Source framebuffer address. Must be aligned on a 16-bit boundary. \\ \hline 0x1C & RW & 640 & Source horizontal resolution. \\ \hline 0x20 & RW & 480 & Source vertical resolution. \\ \hline 0x24 & RW & 0 & Destination framebuffer address. Must be aligned on a 16-bit boundary. \\ \hline 0x28 & RW & 640 & Destination horizontal resolution. \\ \hline 0x2C & RW & 480 & Destination vertical resolution. \\ \hline 0x30 & RW & 0 & Horizontal offset (a number substracted to each destination X coordinate). \\ \hline 0x34 & RW & 0 & Vertical offset (a number substracted to each destination Y coordinate). \\ \hline 0x38 & RW & 63 & Brightness, between 0 and 63. The components of each pixel are multiplied by $ (n+1) \over 64 $ and rounded to the lowest integer. That means that a value of 0 in this register makes the destination picture completely black (because of the limited resolution of RGB565). \\ \hline \end{tabular} \subsection{Performance counters} In order to help tracking down the ``low FPS'' symptom, the core is equipped with integrated performance counters. Those counters are automatically reset when a new frame is submitted for processing, and must be read after the frame processing is finished. These registers are read-only. Attempting to write them results in undefined behaviour. \begin{tabular}{|l|l|l|p{10.5cm}|} \hline \bf{Offset} & \bf{Access} & \bf{Default} & \bf{Description} \\ \hline 0x40 & R & 0 & Total number of drawn pixels. Off-screen pixels are not counted. \\ \hline 0x44 & R & 0 & Total number of used clock cycles. \\ \hline 0x48 & R & 0 & Total number of stalled transactions detected at pipeline monitor 1. \\ \hline 0x4C & R & 0 & Total number of completed transactions detected at pipeline monitor 1. \\ \hline 0x50 & R & 0 & Total number of stalled transactions detected at pipeline monitor 2. \\ \hline 0x54 & R & 0 & Total number of completed transactions detected at pipeline monitor 2. \\ \hline 0x58 & R & 0 & Total number of misses in the input image cache. \\ \hline \end{tabular} \section{Encoding the vertex data} The core supports a maximum mesh of 128x128 points. The address of the point at indices $ (x, y) $ in the mesh is, regardless of the actual the number of mesh points : \begin{equation*} base + 4 \cdot (128 \cdot y + x) \end{equation*} This means that the mesh always has the same size in memory. Each point is made up of 32 bits, with the 16 upper bits being the destination Y coordinates and the 16 lower bits the X coordinate. Exactly 64KB are used by the mesh. \section{Architecture} \begin{figure}[H] \centering \includegraphics[height=180mm]{architecture.eps} \caption{Texture mapping unit architecture.}\label{fig:architecture} \end{figure} \subsection{Handshake protocol between pipeline stages} Because pipeline stages are not always ready to accept and/or to produce data (because, for example, of memory latencies), a flow control protocol must be implemented. The situation is the same between all stages : an upstream stage is registering data into a downstream stage. During some cycles, the upstream stage cannot produce valid data and/or the downstream stage is processing the previous data and has no memory left to store the incoming data. \begin{figure}[H] \centering \includegraphics[height=30mm]{comm.eps} \caption{Communication between two pipeline stages.}\label{fig:comm} \end{figure} Appropriate handling of these cases is done using standardized \verb!stb! and \verb!ack! signals. The meaning of these is summarized in this table :\\ \begin{tabular}{|l|l|p{12cm}|} \hline \verb!stb! & \bf \verb!ack! & \bf Situation \\ \hline 0 & 0 & The upstream stage does not have data to send, and the downstream stage is not ready to accept data. \\ \hline 0 & 1 & The downstream stage is ready to accept data, but the upstream stage has currently no data to send. The downstream stage is not required to keep its \verb!ack! signal asserted. \\ \hline 1 & 0 & The upstream stage is trying to send data to the downstream stage, which is currently not ready to accept it. The transaction is \textit{stalled}. The upstream stage must keep \verb!stb! asserted and continue to present valid data until the transaction is completed. \\ \hline 1 & 1 & The upstream stage is sending data to the downstream stage which is ready to accept it. The transaction is \textit{completed}. The downstream stage must register the incoming data, as the upstream stage is not required to hold it valid at the next cycle. \\ \hline \end{tabular}\\ It is not allowed to generate \verb!ack! combinatorially from \verb!stb!. The \verb!ack! signal must always represent the current state of the downstream stage, ie. whether or not it will accept whatever data we present to it. \subsection{Triangle filling} The triangle filling algorithm is inspired by \url{http://www.geocities.com/wronski12/3d_tutor/tri_fillers.html}. To make the linear interpolations, a variant of the Bresenham algorithm is used. \end{document}