1 |
10 |
root |
<!--# set var="title" value="Title" -->
|
2 |
|
|
<!--# include virtual="/ssi/ssi_start.shtml" -->
|
3 |
|
|
<link REL="stylesheet" TYPE="text/css" HREF="/people/tantos/styles.css">
|
4 |
|
|
|
5 |
|
|
<h1>Wishbone Monitor Controller Accelerator</h1>
|
6 |
|
|
<h2>Description</h2>
|
7 |
|
|
<strong>Wishbone Monitor Controller Acclereator</strong> is a small address-generator
|
8 |
|
|
which can (when used properly) fasten up various common video memory operations.
|
9 |
|
|
The core is 100% Wishbone compatible with the
|
10 |
|
|
<a href="/cores/wb_tk/wb_extensions.shtml">WishboneTK extensions</a>. It incorporates 2
|
11 |
|
|
Wishbone interfaces. A salve interface for accessing it's internal registers and to
|
12 |
|
|
generating access cycles to the pixel memory and one master interface for the pixel memory.
|
13 |
|
|
The data width on the two interfaces should be the same but address might not.
|
14 |
|
|
|
15 |
|
|
<h3>Accelerator functions</h3>
|
16 |
|
|
<p>
|
17 |
|
|
Many 8-bit systems has very limited address space. It's common that they can't access more
|
18 |
|
|
than 64k of memory but often the available address spaces for video buffer is even much
|
19 |
|
|
smaller. There are two ways to overcome this:
|
20 |
|
|
<ul>
|
21 |
|
|
<li>Paging</li>
|
22 |
|
|
<li>Indirect addressing</li>
|
23 |
|
|
</ul>
|
24 |
|
|
Both implementations are quite common in various places. For this core I'm planning to use
|
25 |
|
|
the second approach with a little twist. You can imagine the indirect addressing of a
|
26 |
|
|
frame buffer as having a cursor on it. Writes will overwrite values under the cursor and reads
|
27 |
|
|
will return the value under the cursor. Writes to the index register will alter the cursor
|
28 |
|
|
position, reads will return cursor location. That fits seamlessly to the way many algorithm
|
29 |
|
|
thinks about the display. However there is one drawback: you need two accesses in average to reach
|
30 |
|
|
a pixel. First you have to set the cursor position by writing the index register and then you can
|
31 |
|
|
access the requested memory position. The first approach (Paging) excels in this. It can provide
|
32 |
|
|
the same information at the same time.
|
33 |
|
|
<p>
|
34 |
|
|
The important thing is that while paging provides two type of information in one cycle (what
|
35 |
|
|
and where) indirect addressing provides only one.
|
36 |
|
|
<p>
|
37 |
|
|
Many algorithms uses relative addressing which is quite time-consuming to implement in SW. they
|
38 |
|
|
would say things like, write to the pixel to the left, or up; Go two rows down, etc. The
|
39 |
|
|
accelerator part of the design will do such post-modification of the cursor automatically by HW.
|
40 |
|
|
It will have a set of modify registers (a small bunch of RAM). It will also has a separate address
|
41 |
|
|
space of (lets say) 256 locations, used for indirect frame-buffer access. A write operation on this
|
42 |
|
|
address space will identify a post-modify register and provide the information to be written to the
|
43 |
|
|
frame-buffer. After the write completes, the cursor location will be incremented/decremented by the
|
44 |
|
|
value in the addressed post-modify register. Read operation can work the very same way only
|
45 |
|
|
returning data instead of writing it to the frame-buffer.
|
46 |
|
|
|
47 |
|
|
<p>
|
48 |
|
|
Let's have an example! Let's say we would like to implement an algorithm of scrolling the screen
|
49 |
|
|
up by one line. With direct access the code for that would be something like this:
|
50 |
|
|
<pre>
|
51 |
|
|
mov R1, frame_buffer_addr
|
52 |
|
|
mov R2, frame_buffer_addr+xres/8
|
53 |
|
|
mov R3, frame_buffer_size-xres/8
|
54 |
|
|
loop:
|
55 |
|
|
mov R4,[R2]
|
56 |
|
|
mov [R1],R4
|
57 |
|
|
inc R1
|
58 |
|
|
inc R2
|
59 |
|
|
dec R3
|
60 |
|
|
jnz loop
|
61 |
|
|
</pre>
|
62 |
|
|
<p>
|
63 |
|
|
The same thing with indirect addressing:
|
64 |
|
|
<pre>
|
65 |
|
|
mov R1, 0
|
66 |
|
|
mov R2, xres/8
|
67 |
|
|
mov R3, frame_buffer_size-xres/8
|
68 |
|
|
loop:
|
69 |
|
|
mov [cursor_reg],R2
|
70 |
|
|
mov R4,[pixel_reg]
|
71 |
|
|
mov [cursor_reg],R3
|
72 |
|
|
mov [pixel_reg],R4
|
73 |
|
|
inc R1
|
74 |
|
|
inc R2
|
75 |
|
|
dec R3
|
76 |
|
|
jnz loop
|
77 |
|
|
</pre>
|
78 |
|
|
<p>
|
79 |
|
|
And finally the same thing with the accelerator functions:
|
80 |
|
|
<pre>
|
81 |
|
|
mov R1, -xres/8
|
82 |
|
|
mov [cursor_post_inc_reg1],R1
|
83 |
|
|
mov R1, 1+xres/8
|
84 |
|
|
mov [cursor_post_inc_reg2],R1
|
85 |
|
|
mov R1, frame_buffer_addr
|
86 |
|
|
mov [cursor_reg],R1
|
87 |
|
|
mov R2, frame_buffer_size-xres/8
|
88 |
|
|
loop:
|
89 |
|
|
mov R1,[cursor_post_inc_reg1] ; will modify cursor position to the upper byte
|
90 |
|
|
mov [cursor_post_inc_reg2],R2 ; will modify cursor position to the lower-right pixel
|
91 |
|
|
dec R2
|
92 |
|
|
jnz loop
|
93 |
|
|
</pre>
|
94 |
|
|
<p>
|
95 |
|
|
As you can see for this particular algorithm this approach is faster than even the direct access
|
96 |
|
|
method to the frame-buffer. Hence the pompous word, accelerator.
|
97 |
|
|
<p>
|
98 |
|
|
If you think about the most common functions you will find that this approach fastens at least the
|
99 |
|
|
following operations
|
100 |
|
|
<ul>
|
101 |
|
|
<li>copy one part of the screen to another location, especially</li>
|
102 |
|
|
<ul>
|
103 |
|
|
<li>scroll</li>
|
104 |
|
|
<li>mouse cursor</li>
|
105 |
|
|
<li>character rendering</li>
|
106 |
|
|
</ul>
|
107 |
|
|
<li>Copy from local memory to frame buffer or back</li>
|
108 |
|
|
<li>Line-drawing. Both vertical, horizontal, and Brezenhelm algorithms can benefit from it</li>
|
109 |
|
|
<li>Drawing continuos curves of nearly any kind</li>
|
110 |
|
|
</ul>
|
111 |
|
|
Of course there is a downside:
|
112 |
|
|
<ul>
|
113 |
|
|
<li>It takes time to set-up the engine and to program the various post-increment values</li>
|
114 |
|
|
<li>Generaly, when arbitrary pixels are needed to be accessed it's not any faster than indirect addressing</li>
|
115 |
|
|
<li>Required address range comparable to paged access</li>
|
116 |
|
|
</ul>
|
117 |
|
|
|
118 |
|
|
<h3>Programming information</h3>
|
119 |
|
|
<p>
|
120 |
|
|
As discussed above the accelerator maintains and updates a cursor register. That register is large enough to address all
|
121 |
|
|
possible locations in the pixel memory, that is it has <code>video_addr_width</code> number of bits. This register
|
122 |
|
|
can be read or written by asserting the cur_stb_i pin in a valid Wishbone cycle.
|
123 |
|
|
<p>
|
124 |
|
|
Each location in the accelerator memory functions as an increment value to the cursor. Thus each location has the same
|
125 |
|
|
size as the cursor register itself. If <code>data_width</code> is less than <code>video_addr_width</code> such a
|
126 |
|
|
location cannot be updated in one access. In that case the lower <code>data_width</code> bits of the location written
|
127 |
|
|
or read directly form the data bus of the master Wishbone bus when the <code>ACC_STB_I</code> signal is asserted.
|
128 |
|
|
The upper part of the location is written to/from the extension register. That register can be accessed by
|
129 |
|
|
asserting the <code>EXT_STB_I</code> signal. Pixel memory accesses are performed upon the assertion of the
|
130 |
|
|
<code>MEM_STB_I</code> signal. The accelerator location is selected by the address bits, the pixel memory location
|
131 |
|
|
will be the current value of the cursor register, and the cursor register is updated with the value read from the
|
132 |
|
|
accelerator location.
|
133 |
|
|
<p>
|
134 |
|
|
The accelerator has 2**accel_size locations.
|
135 |
|
|
|
136 |
|
|
<h3>Wishbone datasheet</h3>
|
137 |
|
|
<table border>
|
138 |
|
|
<tr><th>Description</th><th>Specification</th></tr>
|
139 |
|
|
<tr><td>General Description </td><td>Monitor controller accelerator.</td></tr>
|
140 |
|
|
<tr><td>Supported cycles </td><td>Slave read/write<br>Slave block read/write<br>Slave rmw<br>
|
141 |
|
|
Master read/write<br>Master block read/write<br>Master rmw<br></td></tr>
|
142 |
|
|
<tr><td>Data port size </td><td>Configurable on slave side, same on the master side</td></tr>
|
143 |
|
|
<tr><td>Data port granularity </td><td>Bus size</td></tr>
|
144 |
|
|
<tr><td>Data port maximum operand size </td><td>Bus size</td></tr>
|
145 |
|
|
<tr><td>Data transfer ordering </td><td>Little endien</td></tr>
|
146 |
|
|
<tr><td>Data transfer sequencing </td><td>n/a</td></tr>
|
147 |
|
|
<tr><td>Supported signal list and cross reference to equivalent Wishbone signals</td><td>
|
148 |
|
|
<table>
|
149 |
|
|
<tr><th>Signal name</th><th>Wishbone equiv.</th></tr>
|
150 |
|
|
<tr><th colspan="2">Common signals for all ports</th></tr>
|
151 |
|
|
<tr><td>CLK_I </td><td>CLK_I</td></tr>
|
152 |
|
|
<tr><td>RST_I </td><td>RST_I</td></tr>
|
153 |
|
|
<tr><th colspan="2">Signals to connect to master</th></tr>
|
154 |
|
|
<tr><td>CYC_I </td><td>CYC_I</td></tr>
|
155 |
|
|
<tr><td>CUR_STB_I </td><td>STB_I</td></tr>
|
156 |
|
|
<tr><td>EXT_STB_I </td><td>STB_I</td></tr>
|
157 |
|
|
<tr><td>ACC_STB_I </td><td>STB_I</td></tr>
|
158 |
|
|
<tr><td>MEM_STB_I </td><td>STB_I</td></tr>
|
159 |
|
|
<tr><td>WE_I </td><td>WE_I </td></tr>
|
160 |
|
|
<tr><td>ACK_O </td><td>ACK_O</td></tr>
|
161 |
|
|
<tr><td>SEL_I(..) </td><td>SEL_I()</td></tr>
|
162 |
|
|
<tr><td>ADR_I(..) </td><td>ADR_I()</td></tr>
|
163 |
|
|
<tr><td>DAT_I(..) </td><td>DAT_I()</td></tr>
|
164 |
|
|
<tr><td>DAT_O(..) </td><td>DAT_O()</td></tr>
|
165 |
|
|
<tr><th colspan="2">Signals to connect to the pixel memory</th></tr>
|
166 |
|
|
<tr><td>V_CYC_O </td><td>CYC_O</td></tr>
|
167 |
|
|
<tr><td>V_STB_O </td><td>STB_O</td></tr>
|
168 |
|
|
<tr><td>V_WE_O </td><td>WE_O </td></tr>
|
169 |
|
|
<tr><td>V_ACK_I </td><td>ACK_I</td></tr>
|
170 |
|
|
<tr><td>V_ADR_O(..) </td><td>ADR_O()</td></tr>
|
171 |
|
|
<tr><td>v_SEL_o(..) </td><td>SEL_O()</td></tr>
|
172 |
|
|
<tr><td>V_DAT_I(..) </td><td>DAT_I()</td></tr>
|
173 |
|
|
<tr><td>V_DAT_O(..) </td><td>DAT_O()</td></tr>
|
174 |
|
|
</table>
|
175 |
|
|
</table>
|
176 |
|
|
<h3>Parameter description</h3>
|
177 |
|
|
<table border>
|
178 |
|
|
<tr><td>Parameter name</th><th>Description</th></tr>
|
179 |
|
|
<tr><td>accel_size </td><td>Address size of the accelerator memory</td></tr>
|
180 |
|
|
<tr><td>video_addr_width </td><td>Address size of the pixel memory</td></tr>
|
181 |
|
|
<tr><td>data_width </td><td>Data size of the interfaces</td></tr>
|
182 |
|
|
</table>
|
183 |
|
|
<h3>Signal description</h3>
|
184 |
|
|
<table border>
|
185 |
|
|
<tr><th>Signal name</th><th>Description</th></tr>
|
186 |
|
|
<tr><th colspan="2">Signals to connect to master</th></tr>
|
187 |
|
|
<tr><td>CYC_I </td><td>Wishbone cycle signal. High value frames blocks of access</td></tr>
|
188 |
|
|
<tr><td>CUR_STB_I </td><td>Wishbone strobe signal. High value indicates cycle to the cursor register</td></tr>
|
189 |
|
|
<tr><td>EXT_STB_I </td><td>Wishbone strobe signal. High value indicates cycle to the extension register</td></tr>
|
190 |
|
|
<tr><td>ACC_STB_I </td><td>Wishbone strobe signal. High value indicates cycle to the accelerator memory</td></tr>
|
191 |
|
|
<tr><td>MEM_STB_I </td><td>Wishbone strobe signal. High value indicates cycle to the pixel memory</td></tr>
|
192 |
|
|
<tr><td>WE_I </td><td>Wishbone write enable signal. High indicates data flowing from master to slave</td></tr>
|
193 |
|
|
<tr><td>ACK_O </td><td>Wishbone acknowledge signal. High indicates that slave finished operation sucessfully</td></tr>
|
194 |
|
|
<tr><td>ACK_OI </td><td>WhisboneTK acknowledge chain input signal</td></tr>
|
195 |
|
|
<tr><td>ADR_I(accel_size-1..0) </td><td>Wishbone address bus signals</td></tr>
|
196 |
|
|
<tr><td>SEL_I(data_width/8-1..0) </td><td>Wishbone byte-selection signals</td></tr>
|
197 |
|
|
<tr><td>DAT_I(data_width-1..0) </td><td>Wishbone data bus input (to slave direction) signals</td></tr>
|
198 |
|
|
<tr><td>DAT_O(data_width-1..0) </td><td>Wishbone data bus output (to master direction) signals</td></tr>
|
199 |
|
|
<tr><td>DAT_OI(cpu_dat_width-1..0) </td><td>WhisboneTK data bus chain input signal</td></tr>
|
200 |
|
|
<tr><th colspan="2">Signals to connect to the pixel memory</th></tr>
|
201 |
|
|
<tr><td>V_CYC_O </td><td>Wishbone cycle signal. High value frames blocks of access</td></tr>
|
202 |
|
|
<tr><td>V_STB_O </td><td>Wishbone strobe signal. High value indicates cycle to this particular device</td></tr>
|
203 |
|
|
<tr><td>V_WE_O </td><td>Wishbone write enable signal. High indicates data flowing from master to slave</td></tr>
|
204 |
|
|
<tr><td>V_ACK_I </td><td>Wishbone acknowledge signal. High indicates that slave finished operation sucessfully</td></tr>
|
205 |
|
|
<tr><td>V_ADR_O(video_addr_width-2..0) </td><td>Wishbone address bus signals</td></tr>
|
206 |
|
|
<tr><td>V_SEL_O(data_width/8-1..0) </td><td>Wishbone byte-selection signals</td></tr>
|
207 |
|
|
<tr><td>V_DAT_I(data_width-1..0) </td><td>Wishbone data bus input (to slave direction) signals</td></tr>
|
208 |
|
|
<tr><td>V_DAT_O(data_width-1..0) </td><td>Wishbone data bus output (to master direction) signals</td></tr>
|
209 |
|
|
</table>
|
210 |
|
|
|
211 |
|
|
|
212 |
|
|
<h2>Author & Maintainer</h2>
|
213 |
|
|
<p>
|
214 |
|
|
<a href="/people/tantos">Andras Tantos</a>
|
215 |
|
|
<!--# include virtual="/ssi/ssi_end.shtml" -->
|