1 |
2 |
spyros |
TWOFISH MANUAL
|
2 |
|
|
|
3 |
|
|
(c) 2006 Spyros Ninos
|
4 |
|
|
|
5 |
|
|
This document is under the GPL. See file COPYING for licence details.
|
6 |
|
|
|
7 |
|
|
1. Introduction
|
8 |
|
|
2. Crypto primitives usage
|
9 |
|
|
3. Testbenches
|
10 |
|
|
4. Misc + Tips
|
11 |
|
|
|
12 |
|
|
1. INTRODUCTION
|
13 |
|
|
===============
|
14 |
|
|
|
15 |
|
|
Twofish is a 128bit-block symmetric cipher, finalist candidate for the AES contest.
|
16 |
|
|
It supports keys of 128, 192, 256 and all the sizes below 256 bits (with padding). This
|
17 |
|
|
implementation accepts keys of 128, 192 and 256 bits. If you want a different size then
|
18 |
|
|
you'll have to create the padder yourself. The implementation was written in a VHDL 87
|
19 |
|
|
and 93 mixed versions. Just to be sure, use the 93 version in compilation. The naming
|
20 |
|
|
convention for the components is kept as simple but self-explanatory as I could. I had
|
21 |
|
|
in mind that it would be possible to use two or three ciphers in the same design, so
|
22 |
|
|
names are such that there would be no name-conflict (I hope...). For every key-depended
|
23 |
|
|
component the respective key size was used in the name to indicate the component's
|
24 |
|
|
target (i.e twofish_S128 is for 128 bit key). The cipher components are pure
|
25 |
|
|
combinational circuits. This decision was based upon the assumption of portability.
|
26 |
|
|
Since no memory is used, it can be implemented in any programmable device. Also, maximum
|
27 |
|
|
flexibility was intended by dividing the cipher in key-parts. By doing this, you
|
28 |
|
|
have the choice to implement the cipher as a rolled out, iterative, pipelined or any
|
29 |
|
|
other architecture you may like to build.
|
30 |
|
|
|
31 |
|
|
2. CRYPTO PRIMITIVES USAGE
|
32 |
|
|
==========================
|
33 |
|
|
|
34 |
|
|
The twofish.vhd file is divided in four parts. Firstly, there's the part where all
|
35 |
|
|
the key-independent components are found. The next three parts concern the components
|
36 |
|
|
that depend on the key - 128, 192 and 256 bits respectively.
|
37 |
|
|
|
38 |
|
|
In the file you'll find the below main components:
|
39 |
|
|
|
40 |
|
|
1) twofish_data_input
|
41 |
|
|
2) twofish_data_output
|
42 |
|
|
3) twofish_S128
|
43 |
|
|
4) twofish_keysched128
|
44 |
|
|
5) twofish_whit_keysched128
|
45 |
|
|
6) twofish_encryption_round128
|
46 |
|
|
7) twofish_decryption_round128
|
47 |
|
|
8) twofish_S192
|
48 |
|
|
9) twofish_keysched192
|
49 |
|
|
10) twofish_whit_keysched192
|
50 |
|
|
11) twofish_encryption_round192
|
51 |
|
|
12) twofish_decryption_round192
|
52 |
7 |
spyros |
13) twofish_S256
|
53 |
|
|
14) twofish_keysched256
|
54 |
|
|
15) twofish_whit_keysched256
|
55 |
|
|
16) twofish_encryption_round256
|
56 |
|
|
17) twofish_decryption_round256
|
57 |
2 |
spyros |
|
58 |
|
|
|
59 |
|
|
You'll also find all the other components that the above depend on, but they are not
|
60 |
|
|
important in building the cipher - except perhaps if you want to study the structure
|
61 |
|
|
of this implementation and/or modify it. A short description of them follows:
|
62 |
|
|
|
63 |
|
|
1) The first component is the TWOFISH_DATA_INPUT, which is a simple tranformation of the
|
64 |
|
|
input data from the way we provide it (which is big endian) to little endian convention,
|
65 |
|
|
as required by the twofish specification. It must be used as an interface between the
|
66 |
|
|
input data provided to the circuit and the rest of the cipher. An alternative would be
|
67 |
|
|
to extract the code from it and integrate it to another component. Note that since the
|
68 |
|
|
data block size of the cipher is always 128 bits, this component is supposed to be used
|
69 |
|
|
with the components of all the key-sizes. The interface of the component is as follows:
|
70 |
|
|
|
71 |
|
|
entity twofish_data_input is
|
72 |
|
|
port (
|
73 |
|
|
in_tdi : in std_logic_vector(127 downto 0);
|
74 |
|
|
out_tdi : out std_logic_vector(127 downto 0)
|
75 |
|
|
);
|
76 |
|
|
end twofish_data_input;
|
77 |
|
|
|
78 |
|
|
It is quite simple; in_tdi is the data input as we provide it and out_tdi is the
|
79 |
|
|
transformed input data. (tdi comes from the Twofish Data Input)
|
80 |
|
|
|
81 |
|
|
2) The component TWOFISH_DATA_OUTPUT makes the reverse procedure of the twofish_data_input.
|
82 |
|
|
It takes the little endian convention cipher result and transforms it to the big endian
|
83 |
|
|
one, as the specification requires. This component too is supposed to be used with the
|
84 |
|
|
components of all the key-sizes. The interface is as follows:
|
85 |
|
|
|
86 |
|
|
entity twofish_data_output is
|
87 |
|
|
port (
|
88 |
|
|
in_tdo : in std_logic_vector(127 downto 0);
|
89 |
|
|
out_tdo : out std_logic_vector(127 downto 0)
|
90 |
|
|
);
|
91 |
|
|
end twofish_data_output;
|
92 |
|
|
|
93 |
|
|
in_tdo accepts the ciphertext as we take it from the last round and out_tdo is the
|
94 |
|
|
tranformed ciphertext. (tdo comes from the Twofish Data Output)
|
95 |
|
|
|
96 |
|
|
3) The TWOFISH_S128 is a component that takes the key of 128 bits and produces the S0
|
97 |
|
|
and S1 for the f function. The interface is as follows:
|
98 |
|
|
|
99 |
|
|
entity twofish_S128 is
|
100 |
|
|
port (
|
101 |
|
|
in_key_ts128 : in std_logic_vector(127 downto 0);
|
102 |
|
|
out_Sfirst_ts128,
|
103 |
|
|
out_Ssecond_ts128 : out std_logic_vector(31 downto 0)
|
104 |
|
|
);
|
105 |
|
|
end twofish_S128;
|
106 |
|
|
|
107 |
|
|
Here, in_key_ts128 is the key that we provide. Note that there is no component that
|
108 |
|
|
transforms the key to the form that the twofish specification requires; rather the
|
109 |
|
|
tranformation takes place within the twofish_S128 component. Here, there is the
|
110 |
|
|
assumption/association that Sfirst refers to S0 and Ssecond refers to S1. There is
|
111 |
|
|
no need to remember the association, since throughout the design, the same rule
|
112 |
|
|
is followed, so the only thing you have to do it to connect the pins with the
|
113 |
|
|
same name. This component can be used only when you implement a 128 bit key size
|
114 |
|
|
design. (ts128 comes from Twofish_S128)
|
115 |
|
|
|
116 |
|
|
|
117 |
|
|
4) The TWOFISH_KEYSCHED128 component is the key scheduler of the twofish cipher,
|
118 |
|
|
for 128 bit keys. It's interface is as follows:
|
119 |
|
|
|
120 |
|
|
entity twofish_keysched128 is
|
121 |
|
|
port (
|
122 |
|
|
odd_in_tk128,
|
123 |
|
|
even_in_tk128 : in std_logic_vector(7 downto 0);
|
124 |
|
|
in_key_tk128 : in std_logic_vector(127 downto 0);
|
125 |
|
|
out_key_up_tk128,
|
126 |
|
|
out_key_down_tk128 : out std_logic_vector(31 downto 0)
|
127 |
|
|
);
|
128 |
|
|
end twofish_keysched128;
|
129 |
|
|
|
130 |
|
|
odd_in_tk128 and even_in_tk128 are the numbers of the round 2i and 2i+1, as described
|
131 |
|
|
in the specification. Clearly, 2i relates to the even_in_tk128 and 2i+1 relates to
|
132 |
|
|
the odd_in_tk128. in_key_tk128 is where the key goes. The key must be supplied to the
|
133 |
|
|
components without any endian-transformation; the tranformation takes place in the
|
134 |
|
|
component, as in twofish_S128. out_key_up_tk128 and out_key_down_tk128 are the two
|
135 |
|
|
keys produced from the scheduler. The association is that as we look the twofish
|
136 |
|
|
diagram provided in the specification page 11 (figure 3), the upper key is what we
|
137 |
|
|
get from out_key_up_tk128 and the down key is what we get from out_key_down_tk128.
|
138 |
|
|
As before, you don't have to remember the association, names are used the same throughout
|
139 |
|
|
the whole design. This component too, can be used only when you implement a 128 bit key
|
140 |
|
|
size design. (tk128 comes from Twofish_Keysched128).
|
141 |
|
|
|
142 |
|
|
IMPORTANT NOTICE: This component can be used in two ways: in combination with
|
143 |
|
|
twofish_whit_keysched128 (see below) or as a standalone component. In the first
|
144 |
|
|
case, whitening keys are produced by twofish_whit_keysched128; so even_in_tk128 and
|
145 |
|
|
odd_in_tk128 must start from 8,9 respectively and above. Or if you use it standalone
|
146 |
|
|
then you can start from 0 and above.
|
147 |
|
|
|
148 |
|
|
5) The TWOFISH_WHIT_KEYSCHED128 produces the whitening keys K0..K7. The interface
|
149 |
|
|
is as follows:
|
150 |
|
|
|
151 |
|
|
entity twofish_whit_keysched128 is
|
152 |
|
|
port (
|
153 |
|
|
in_key_twk128 : in std_logic_vector(127 downto 0);
|
154 |
|
|
out_K0_twk128,
|
155 |
|
|
out_K1_twk128,
|
156 |
|
|
out_K2_twk128,
|
157 |
|
|
out_K3_twk128,
|
158 |
|
|
out_K4_twk128,
|
159 |
|
|
out_K5_twk128,
|
160 |
|
|
out_K6_twk128,
|
161 |
|
|
out_K7_twk128 : out std_logic_vector(31 downto 0)
|
162 |
|
|
);
|
163 |
|
|
end twofish_whit_keysched128;
|
164 |
|
|
|
165 |
|
|
in_key_twk128 is where the key is connected. As above, no big-little endian tranformation
|
166 |
|
|
must take place. It is performed within the component. The eight outputs produce the
|
167 |
|
|
keys. This component too can be used only when you implement a 128 bit key size design.
|
168 |
|
|
(twk128 comes from Twofish_Whit_Keysched128).
|
169 |
|
|
|
170 |
|
|
IMPORTANT NOTICE: If this component is to be used as a combination with twofish_keysched128
|
171 |
|
|
care should be taken when supplying numbers to the latter. Read the notice of the
|
172 |
|
|
twofish_keysched128.
|
173 |
|
|
|
174 |
|
|
6) The TWOFISH_ENCRYPTION_ROUND128 is the component that implements one round of encryption.
|
175 |
|
|
The interface is as follows:
|
176 |
|
|
|
177 |
|
|
entity twofish_encryption_round128 is
|
178 |
|
|
port (
|
179 |
|
|
in1_ter128,
|
180 |
|
|
in2_ter128,
|
181 |
|
|
in3_ter128,
|
182 |
|
|
in4_ter128,
|
183 |
|
|
in_Sfirst_ter128,
|
184 |
|
|
in_Ssecond_ter128,
|
185 |
|
|
in_key_up_ter128,
|
186 |
|
|
in_key_down_ter128 : in std_logic_vector(31 downto 0);
|
187 |
|
|
out1_ter128,
|
188 |
|
|
out2_ter128,
|
189 |
|
|
out3_ter128,
|
190 |
|
|
out4_ter128 : out std_logic_vector(31 downto 0)
|
191 |
|
|
);
|
192 |
|
|
end twofish_encryption_round128;
|
193 |
|
|
|
194 |
|
|
in1_ter128, in1_ter128, in1_ter128, in1_ter128 are the four 32 bit inputs to the cipher
|
195 |
|
|
round. in_Sfirst_ter128, in_Ssecond_ter128 are the two S needed for the g functions,
|
196 |
|
|
in_key_up_ter128 and in_key_down_ter128 are the two round keys. Note that up and down
|
197 |
|
|
names are given to keys according to the diagram given in Twofish spec. You don't need
|
198 |
|
|
to worry about it, keys follow the same naming convention throughout the whole design.
|
199 |
|
|
Finally, out1_ter128, out1_ter228, out3_ter128 and out4_ter128 are the 32 bit outputs
|
200 |
7 |
spyros |
of the encryption round (ter128 comes from Twofish_Encryption_Round128).
|
201 |
2 |
spyros |
|
202 |
|
|
IMPORTANT NOTICE: the output swapping is taking place IN the component. YOU HAVE TO undo
|
203 |
|
|
the last swap after the 16th round.
|
204 |
|
|
|
205 |
|
|
7) The TWOFISH_DECRYPTION_ROUND128 is the component tha implements one round of decryption.
|
206 |
|
|
The interface is as follows:
|
207 |
|
|
|
208 |
|
|
entity twofish_decryption_round128 is
|
209 |
|
|
port (
|
210 |
|
|
in1_tdr128,
|
211 |
|
|
in2_tdr128,
|
212 |
|
|
in3_tdr128,
|
213 |
|
|
in4_tdr128,
|
214 |
|
|
in_Sfirst_tdr128,
|
215 |
|
|
in_Ssecond_tdr128,
|
216 |
|
|
in_key_up_tdr128,
|
217 |
|
|
in_key_down_tdr128 : in std_logic_vector(31 downto 0);
|
218 |
|
|
out1_tdr128,
|
219 |
|
|
out2_tdr128,
|
220 |
|
|
out3_tdr128,
|
221 |
|
|
out4_tdr128 : out std_logic_vector(31 downto 0)
|
222 |
|
|
);
|
223 |
|
|
end twofish_decryption_round128;
|
224 |
|
|
|
225 |
|
|
As in twofish_encryption_round128 component, the ports are quite self-explanatory.
|
226 |
7 |
spyros |
(tdr128 comes from Twofish_Decryption_Round128).
|
227 |
2 |
spyros |
|
228 |
|
|
IMPORTANT NOTICE: as in twofish_encryption_round128, inside the component the output
|
229 |
|
|
swapping is taking place. YOU HAVE TO undo the last swap after the 16th round.
|
230 |
|
|
|
231 |
|
|
|
232 |
|
|
Components
|
233 |
|
|
|
234 |
|
|
8) twofish_S192
|
235 |
|
|
9) twofish_keysched192
|
236 |
|
|
10) twofish_whit_keysched192
|
237 |
|
|
11) twofish_encryption_round192
|
238 |
|
|
12) twofish_decryption_round192
|
239 |
7 |
spyros |
13) twofish_S256
|
240 |
|
|
14) twofish_keysched256
|
241 |
|
|
15) twofish_whit_keysched256
|
242 |
|
|
16) twofish_encryption_round256
|
243 |
|
|
17) twofish_decryption_round256
|
244 |
2 |
spyros |
|
245 |
|
|
work exactly as their 128 bit counterparts. The only difference is the third S that
|
246 |
7 |
spyros |
is provided by twofish_S192 and needed by some of the rest of them, and the fourth
|
247 |
|
|
S that is provided by twofish_S256. I.e:
|
248 |
2 |
spyros |
|
249 |
|
|
entity twofish_S192 is
|
250 |
|
|
port (
|
251 |
|
|
in_key_ts192 : in std_logic_vector(191 downto 0);
|
252 |
|
|
out_Sfirst_ts192,
|
253 |
|
|
out_Ssecond_ts192,
|
254 |
|
|
out_Sthird_ts192 : out std_logic_vector(31 downto 0)
|
255 |
|
|
);
|
256 |
|
|
end twofish_S192;
|
257 |
|
|
|
258 |
7 |
spyros |
which provide a third S that is used in twofish encryption and decryption rounds for
|
259 |
|
|
192 bits and the fourth S that is provided from the twofish_S256 is used by the
|
260 |
|
|
twofish encryption and decryption rounds for 256 bits.
|
261 |
2 |
spyros |
Every IMPORTANT NOTICE that exist for the 128 bit components, are valid for
|
262 |
|
|
these components too.
|
263 |
|
|
|
264 |
|
|
|
265 |
|
|
3. TESTBENCHES
|
266 |
|
|
==============
|
267 |
|
|
|
268 |
|
|
Testbenches for the cipher are provided for Tables, Variable key, Variable text,
|
269 |
|
|
ECB/CBC Monte Carlo encryption and decryption tests. Every testbench comes with
|
270 |
|
|
it's respective testvector file. The testvector file is transformed into a form
|
271 |
|
|
that it's easier to be manipulated, than in the original form supplied by the
|
272 |
|
|
cipher designer(s).
|
273 |
|
|
|
274 |
|
|
Every testbench produces a file with the results, that can be cross-checked
|
275 |
|
|
with the testvector file of input - just to be certain that results are as
|
276 |
|
|
expected (usually with "diff").
|
277 |
|
|
|
278 |
|
|
Along with the transformed testvector files, the orignal testvector files - which
|
279 |
|
|
are provided by the cipher designer(s) - are given. That way, you can check the
|
280 |
|
|
originality of the transformed testvector files if you want to.
|
281 |
|
|
|
282 |
|
|
Finally, some secondary circuits are provided for the testbenches to work. These
|
283 |
|
|
are a 128 bit register, a mux and demux for 128 bit input(s)/output(s).
|
284 |
|
|
|
285 |
|
|
4. MISC + TIPS
|
286 |
|
|
==============
|
287 |
|
|
|
288 |
|
|
You must pay attention in the whitening steps. None of the components actually
|
289 |
|
|
implements the input or output whitening steps. You only have the component
|
290 |
|
|
that produces the whitening keys.
|
291 |
|
|
|
292 |
|
|
Each cipher implementation was designed so as to demand as few components as
|
293 |
|
|
possible. That way, there would be no difficulty in using them and designing
|
294 |
|
|
the algo in its total. The problem is that the Reed Solomon used to produce
|
295 |
|
|
the S keys is in a rolled-out form, because I chose not to use any form of
|
296 |
|
|
memory. So, if you want to implement more that one key size cipher in the
|
297 |
|
|
same circuit/FPGA, the design size grows very much, and I doubt if it will
|
298 |
|
|
fit in a signle FPGA. If you decide that you need more that one cipher
|
299 |
|
|
instantiation, then you'll have to tweak the design of the Reed Solomon.
|
300 |
|
|
One example follows:
|
301 |
|
|
|
302 |
|
|
Current implementaion is that the reed solomon components are specifically
|
303 |
|
|
designed for the key size of the cipher, i.e for 128 bits key:
|
304 |
|
|
|
305 |
|
|
entity reed_solomon128 is
|
306 |
|
|
port (
|
307 |
|
|
in_rs128 : in std_logic_vector(127 downto 0);
|
308 |
|
|
out_Sfirst_rs128,
|
309 |
|
|
out_Ssecond_rs128 : out std_logic_vector(31 downto 0)
|
310 |
|
|
);
|
311 |
|
|
end reed_solomon128;
|
312 |
|
|
|
313 |
|
|
and for 192 bits key:
|
314 |
|
|
|
315 |
|
|
entity reed_solomon192 is
|
316 |
|
|
port (
|
317 |
|
|
in_rs192 : in std_logic_vector(191 downto 0);
|
318 |
|
|
out_Sfirst_rs192,
|
319 |
|
|
out_Ssecond_rs192,
|
320 |
|
|
out_Sthird_rs192 : out std_logic_vector(31 downto 0)
|
321 |
|
|
);
|
322 |
|
|
end reed_solomon192;
|
323 |
|
|
|
324 |
|
|
What is happening, is that each component takes the input key, and performs the
|
325 |
|
|
multiplications in rolled-out form of every 64 bit input. In other terms, in_rs128
|
326 |
|
|
is split up in two 64 bit chunks, and each one is driven in it's respective
|
327 |
|
|
multipliers. The result of the first multipliers is driven to out_Sfirst, the result
|
328 |
|
|
of the second to out_Ssecond. Respectively for reed_solomon192 the result of the
|
329 |
7 |
spyros |
third multiplier is driven to out_Sthird and for reed_solomon256 the result of the
|
330 |
|
|
fourth multiplier is driven to out_Sfourth. Every multiplication needs it's
|
331 |
2 |
spyros |
multipliers (note that it is not a single mul, but a group of them because its
|
332 |
7 |
spyros |
a matrix multiplication) so in the first component we need two groups of muls,
|
333 |
|
|
in the second component three groups of them and in the third we need four.
|
334 |
2 |
spyros |
|
335 |
7 |
spyros |
If you had to implement cipher with 128 and 192 sizes for example, you'd have to
|
336 |
|
|
implement both reed solomon components which total in 5 groups of multipliers.
|
337 |
|
|
One solution would be to create a reed solomon that would take a single 64 bit
|
338 |
|
|
input and procude a single 32 bit output. For example:
|
339 |
2 |
spyros |
|
340 |
|
|
entity reed_solomon is
|
341 |
|
|
port (
|
342 |
|
|
in_rs : in std_logic_vector(63 downto 0);
|
343 |
|
|
out_S_rs : out std_logic_vector(31 downto 0)
|
344 |
|
|
);
|
345 |
|
|
end reed_solomon;
|
346 |
|
|
|
347 |
|
|
|
348 |
|
|
Then you would divide the key into 64 bit chunks (128 bit in 2 chunks, 192 bit
|
349 |
7 |
spyros |
in 3 chunks and 256 bits in 4 chunks) and provide them to the component
|
350 |
|
|
sequentially. The results of the reed_solomon could be stored in a sort of RAM.
|
351 |
|
|
That way you may slow down the process but you get to implement only one group
|
352 |
|
|
of multipliers and you gain a lot in space.
|
353 |
2 |
spyros |
|
354 |
|
|
The same goes for the whitening keys components. In the whitening components
|
355 |
12 |
spyros |
the function h is impemented 8 times (2 h functions for each pair of keys,
|
356 |
2 |
spyros |
for the first 8 keys - K0..7). You could follow the above example and implement
|
357 |
|
|
a component that accepts a 64 bit input (key chunk, every M is 32 bit, you need
|
358 |
|
|
2 Ms for every h function) and produce a single 32 bit key. Thus, you can
|
359 |
|
|
produce every key sequentially and store it in a RAM for example.
|
360 |
|
|
|
361 |
|
|
If you want some implementation examples, you'll have to read the testbenches.
|
362 |
|
|
The cipher is implemented in iterated mode, but you'll get a clear picture of how
|
363 |
|
|
to connect the components.
|
364 |
|
|
|