1 |
27 |
unneback |
IJG JPEG LIBRARY: SYSTEM ARCHITECTURE
|
2 |
|
|
|
3 |
|
|
Copyright (C) 1991-1995, Thomas G. Lane.
|
4 |
|
|
This file is part of the Independent JPEG Group's software.
|
5 |
|
|
For conditions of distribution and use, see the accompanying README file.
|
6 |
|
|
|
7 |
|
|
|
8 |
|
|
This file provides an overview of the architecture of the IJG JPEG software;
|
9 |
|
|
that is, the functions of the various modules in the system and the interfaces
|
10 |
|
|
between modules. For more precise details about any data structure or calling
|
11 |
|
|
convention, see the include files and comments in the source code.
|
12 |
|
|
|
13 |
|
|
We assume that the reader is already somewhat familiar with the JPEG standard.
|
14 |
|
|
The README file includes references for learning about JPEG. The file
|
15 |
|
|
libjpeg.doc describes the library from the viewpoint of an application
|
16 |
|
|
programmer using the library; it's best to read that file before this one.
|
17 |
|
|
Also, the file coderules.doc describes the coding style conventions we use.
|
18 |
|
|
|
19 |
|
|
In this document, JPEG-specific terminology follows the JPEG standard:
|
20 |
|
|
A "component" means a color channel, e.g., Red or Luminance.
|
21 |
|
|
A "sample" is a single component value (i.e., one number in the image data).
|
22 |
|
|
A "coefficient" is a frequency coefficient (a DCT transform output number).
|
23 |
|
|
A "block" is an 8x8 group of samples or coefficients.
|
24 |
|
|
An "MCU" (minimum coded unit) is an interleaved set of blocks of size
|
25 |
|
|
determined by the sampling factors, or a single block in a
|
26 |
|
|
noninterleaved scan.
|
27 |
|
|
We do not use the terms "pixel" and "sample" interchangeably. When we say
|
28 |
|
|
pixel, we mean an element of the full-size image, while a sample is an element
|
29 |
|
|
of the downsampled image. Thus the number of samples may vary across
|
30 |
|
|
components while the number of pixels does not. (This terminology is not used
|
31 |
|
|
rigorously throughout the code, but it is used in places where confusion would
|
32 |
|
|
otherwise result.)
|
33 |
|
|
|
34 |
|
|
|
35 |
|
|
*** System features ***
|
36 |
|
|
|
37 |
|
|
The IJG distribution contains two parts:
|
38 |
|
|
* A subroutine library for JPEG compression and decompression.
|
39 |
|
|
* cjpeg/djpeg, two sample applications that use the library to transform
|
40 |
|
|
JFIF JPEG files to and from several other image formats.
|
41 |
|
|
cjpeg/djpeg are of no great intellectual complexity: they merely add a simple
|
42 |
|
|
command-line user interface and I/O routines for several uncompressed image
|
43 |
|
|
formats. This document concentrates on the library itself.
|
44 |
|
|
|
45 |
|
|
We desire the library to be capable of supporting all JPEG baseline, extended
|
46 |
|
|
sequential, and progressive DCT processes. Hierarchical processes are not
|
47 |
|
|
supported.
|
48 |
|
|
|
49 |
|
|
The library does not support the lossless (spatial) JPEG process. Lossless
|
50 |
|
|
JPEG shares little or no code with lossy JPEG, and would normally be used
|
51 |
|
|
without the extensive pre- and post-processing provided by this library.
|
52 |
|
|
We feel that lossless JPEG is better handled by a separate library.
|
53 |
|
|
|
54 |
|
|
Within these limits, any set of compression parameters allowed by the JPEG
|
55 |
|
|
spec should be readable for decompression. (We can be more restrictive about
|
56 |
|
|
what formats we can generate.) Although the system design allows for all
|
57 |
|
|
parameter values, some uncommon settings are not yet implemented and may
|
58 |
|
|
never be; nonintegral sampling ratios are the prime example. Furthermore,
|
59 |
|
|
we treat 8-bit vs. 12-bit data precision as a compile-time switch, not a
|
60 |
|
|
run-time option, because most machines can store 8-bit pixels much more
|
61 |
|
|
compactly than 12-bit.
|
62 |
|
|
|
63 |
|
|
For legal reasons, JPEG arithmetic coding is not currently supported, but
|
64 |
|
|
extending the library to include it would be straightforward.
|
65 |
|
|
|
66 |
|
|
By itself, the library handles only interchange JPEG datastreams --- in
|
67 |
|
|
particular the widely used JFIF file format. The library can be used by
|
68 |
|
|
surrounding code to process interchange or abbreviated JPEG datastreams that
|
69 |
|
|
are embedded in more complex file formats. (For example, libtiff uses this
|
70 |
|
|
library to implement JPEG compression within the TIFF file format.)
|
71 |
|
|
|
72 |
|
|
The library includes a substantial amount of code that is not covered by the
|
73 |
|
|
JPEG standard but is necessary for typical applications of JPEG. These
|
74 |
|
|
functions preprocess the image before JPEG compression or postprocess it after
|
75 |
|
|
decompression. They include colorspace conversion, downsampling/upsampling,
|
76 |
|
|
and color quantization. This code can be omitted if not needed.
|
77 |
|
|
|
78 |
|
|
A wide range of quality vs. speed tradeoffs are possible in JPEG processing,
|
79 |
|
|
and even more so in decompression postprocessing. The decompression library
|
80 |
|
|
provides multiple implementations that cover most of the useful tradeoffs,
|
81 |
|
|
ranging from very-high-quality down to fast-preview operation. On the
|
82 |
|
|
compression side we have generally not provided low-quality choices, since
|
83 |
|
|
compression is normally less time-critical. It should be understood that the
|
84 |
|
|
low-quality modes may not meet the JPEG standard's accuracy requirements;
|
85 |
|
|
nonetheless, they are useful for viewers.
|
86 |
|
|
|
87 |
|
|
|
88 |
|
|
*** Portability issues ***
|
89 |
|
|
|
90 |
|
|
Portability is an essential requirement for the library. The key portability
|
91 |
|
|
issues that show up at the level of system architecture are:
|
92 |
|
|
|
93 |
|
|
1. Memory usage. We want the code to be able to run on PC-class machines
|
94 |
|
|
with limited memory. Images should therefore be processed sequentially (in
|
95 |
|
|
strips), to avoid holding the whole image in memory at once. Where a
|
96 |
|
|
full-image buffer is necessary, we should be able to use either virtual memory
|
97 |
|
|
or temporary files.
|
98 |
|
|
|
99 |
|
|
2. Near/far pointer distinction. To run efficiently on 80x86 machines, the
|
100 |
|
|
code should distinguish "small" objects (kept in near data space) from
|
101 |
|
|
"large" ones (kept in far data space). This is an annoying restriction, but
|
102 |
|
|
fortunately it does not impact code quality for less brain-damaged machines,
|
103 |
|
|
and the source code clutter turns out to be minimal with sufficient use of
|
104 |
|
|
pointer typedefs.
|
105 |
|
|
|
106 |
|
|
3. Data precision. We assume that "char" is at least 8 bits, "short" and
|
107 |
|
|
"int" at least 16, "long" at least 32. The code will work fine with larger
|
108 |
|
|
data sizes, although memory may be used inefficiently in some cases. However,
|
109 |
|
|
the JPEG compressed datastream must ultimately appear on external storage as a
|
110 |
|
|
sequence of 8-bit bytes if it is to conform to the standard. This may pose a
|
111 |
|
|
problem on machines where char is wider than 8 bits. The library represents
|
112 |
|
|
compressed data as an array of values of typedef JOCTET. If no data type
|
113 |
|
|
exactly 8 bits wide is available, custom data source and data destination
|
114 |
|
|
modules must be written to unpack and pack the chosen JOCTET datatype into
|
115 |
|
|
8-bit external representation.
|
116 |
|
|
|
117 |
|
|
|
118 |
|
|
*** System overview ***
|
119 |
|
|
|
120 |
|
|
The compressor and decompressor are each divided into two main sections:
|
121 |
|
|
the JPEG compressor or decompressor proper, and the preprocessing or
|
122 |
|
|
postprocessing functions. The interface between these two sections is the
|
123 |
|
|
image data that the official JPEG spec regards as its input or output: this
|
124 |
|
|
data is in the colorspace to be used for compression, and it is downsampled
|
125 |
|
|
to the sampling factors to be used. The preprocessing and postprocessing
|
126 |
|
|
steps are responsible for converting a normal image representation to or from
|
127 |
|
|
this form. (Those few applications that want to deal with YCbCr downsampled
|
128 |
|
|
data can skip the preprocessing or postprocessing step.)
|
129 |
|
|
|
130 |
|
|
Looking more closely, the compressor library contains the following main
|
131 |
|
|
elements:
|
132 |
|
|
|
133 |
|
|
Preprocessing:
|
134 |
|
|
* Color space conversion (e.g., RGB to YCbCr).
|
135 |
|
|
* Edge expansion and downsampling. Optionally, this step can do simple
|
136 |
|
|
smoothing --- this is often helpful for low-quality source data.
|
137 |
|
|
JPEG proper:
|
138 |
|
|
* MCU assembly, DCT, quantization.
|
139 |
|
|
* Entropy coding (sequential or progressive, Huffman or arithmetic).
|
140 |
|
|
|
141 |
|
|
In addition to these modules we need overall control, marker generation,
|
142 |
|
|
and support code (memory management & error handling). There is also a
|
143 |
|
|
module responsible for physically writing the output data --- typically
|
144 |
|
|
this is just an interface to fwrite(), but some applications may need to
|
145 |
|
|
do something else with the data.
|
146 |
|
|
|
147 |
|
|
The decompressor library contains the following main elements:
|
148 |
|
|
|
149 |
|
|
JPEG proper:
|
150 |
|
|
* Entropy decoding (sequential or progressive, Huffman or arithmetic).
|
151 |
|
|
* Dequantization, inverse DCT, MCU disassembly.
|
152 |
|
|
Postprocessing:
|
153 |
|
|
* Upsampling. Optionally, this step may be able to do more general
|
154 |
|
|
rescaling of the image.
|
155 |
|
|
* Color space conversion (e.g., YCbCr to RGB). This step may also
|
156 |
|
|
provide gamma adjustment [ currently it does not ].
|
157 |
|
|
* Optional color quantization (e.g., reduction to 256 colors).
|
158 |
|
|
* Optional color precision reduction (e.g., 24-bit to 15-bit color).
|
159 |
|
|
[This feature is not currently implemented.]
|
160 |
|
|
|
161 |
|
|
We also need overall control, marker parsing, and a data source module.
|
162 |
|
|
The support code (memory management & error handling) can be shared with
|
163 |
|
|
the compression half of the library.
|
164 |
|
|
|
165 |
|
|
There may be several implementations of each of these elements, particularly
|
166 |
|
|
in the decompressor, where a wide range of speed/quality tradeoffs is very
|
167 |
|
|
useful. It must be understood that some of the best speedups involve
|
168 |
|
|
merging adjacent steps in the pipeline. For example, upsampling, color space
|
169 |
|
|
conversion, and color quantization might all be done at once when using a
|
170 |
|
|
low-quality ordered-dither technique. The system architecture is designed to
|
171 |
|
|
allow such merging where appropriate.
|
172 |
|
|
|
173 |
|
|
|
174 |
|
|
Note: it is convenient to regard edge expansion (padding to block boundaries)
|
175 |
|
|
as a preprocessing/postprocessing function, even though the JPEG spec includes
|
176 |
|
|
it in compression/decompression. We do this because downsampling/upsampling
|
177 |
|
|
can be simplified a little if they work on padded data: it's not necessary to
|
178 |
|
|
have special cases at the right and bottom edges. Therefore the interface
|
179 |
|
|
buffer is always an integral number of blocks wide and high, and we expect
|
180 |
|
|
compression preprocessing to pad the source data properly. Padding will occur
|
181 |
|
|
only to the next block (8-sample) boundary. In an interleaved-scan situation,
|
182 |
|
|
additional dummy blocks may be used to fill out MCUs, but the MCU assembly and
|
183 |
|
|
disassembly logic will create or discard these blocks internally. (This is
|
184 |
|
|
advantageous for speed reasons, since we avoid DCTing the dummy blocks.
|
185 |
|
|
It also permits a small reduction in file size, because the compressor can
|
186 |
|
|
choose dummy block contents so as to minimize their size in compressed form.
|
187 |
|
|
Finally, it makes the interface buffer specification independent of whether
|
188 |
|
|
the file is actually interleaved or not.) Applications that wish to deal
|
189 |
|
|
directly with the downsampled data must provide similar buffering and padding
|
190 |
|
|
for odd-sized images.
|
191 |
|
|
|
192 |
|
|
|
193 |
|
|
*** Poor man's object-oriented programming ***
|
194 |
|
|
|
195 |
|
|
It should be clear by now that we have a lot of quasi-independent processing
|
196 |
|
|
steps, many of which have several possible behaviors. To avoid cluttering the
|
197 |
|
|
code with lots of switch statements, we use a simple form of object-style
|
198 |
|
|
programming to separate out the different possibilities.
|
199 |
|
|
|
200 |
|
|
For example, two different color quantization algorithms could be implemented
|
201 |
|
|
as two separate modules that present the same external interface; at runtime,
|
202 |
|
|
the calling code will access the proper module indirectly through an "object".
|
203 |
|
|
|
204 |
|
|
We can get the limited features we need while staying within portable C.
|
205 |
|
|
The basic tool is a function pointer. An "object" is just a struct
|
206 |
|
|
containing one or more function pointer fields, each of which corresponds to
|
207 |
|
|
a method name in real object-oriented languages. During initialization we
|
208 |
|
|
fill in the function pointers with references to whichever module we have
|
209 |
|
|
determined we need to use in this run. Then invocation of the module is done
|
210 |
|
|
by indirecting through a function pointer; on most machines this is no more
|
211 |
|
|
expensive than a switch statement, which would be the only other way of
|
212 |
|
|
making the required run-time choice. The really significant benefit, of
|
213 |
|
|
course, is keeping the source code clean and well structured.
|
214 |
|
|
|
215 |
|
|
We can also arrange to have private storage that varies between different
|
216 |
|
|
implementations of the same kind of object. We do this by making all the
|
217 |
|
|
module-specific object structs be separately allocated entities, which will
|
218 |
|
|
be accessed via pointers in the master compression or decompression struct.
|
219 |
|
|
The "public" fields or methods for a given kind of object are specified by
|
220 |
|
|
a commonly known struct. But a module's initialization code can allocate
|
221 |
|
|
a larger struct that contains the common struct as its first member, plus
|
222 |
|
|
additional private fields. With appropriate pointer casting, the module's
|
223 |
|
|
internal functions can access these private fields. (For a simple example,
|
224 |
|
|
see jdatadst.c, which implements the external interface specified by struct
|
225 |
|
|
jpeg_destination_mgr, but adds extra fields.)
|
226 |
|
|
|
227 |
|
|
(Of course this would all be a lot easier if we were using C++, but we are
|
228 |
|
|
not yet prepared to assume that everyone has a C++ compiler.)
|
229 |
|
|
|
230 |
|
|
An important benefit of this scheme is that it is easy to provide multiple
|
231 |
|
|
versions of any method, each tuned to a particular case. While a lot of
|
232 |
|
|
precalculation might be done to select an optimal implementation of a method,
|
233 |
|
|
the cost per invocation is constant. For example, the upsampling step might
|
234 |
|
|
have a "generic" method, plus one or more "hardwired" methods for the most
|
235 |
|
|
popular sampling factors; the hardwired methods would be faster because they'd
|
236 |
|
|
use straight-line code instead of for-loops. The cost to determine which
|
237 |
|
|
method to use is paid only once, at startup, and the selection criteria are
|
238 |
|
|
hidden from the callers of the method.
|
239 |
|
|
|
240 |
|
|
This plan differs a little bit from usual object-oriented structures, in that
|
241 |
|
|
only one instance of each object class will exist during execution. The
|
242 |
|
|
reason for having the class structure is that on different runs we may create
|
243 |
|
|
different instances (choose to execute different modules). You can think of
|
244 |
|
|
the term "method" as denoting the common interface presented by a particular
|
245 |
|
|
set of interchangeable functions, and "object" as denoting a group of related
|
246 |
|
|
methods, or the total shared interface behavior of a group of modules.
|
247 |
|
|
|
248 |
|
|
|
249 |
|
|
*** Overall control structure ***
|
250 |
|
|
|
251 |
|
|
We previously mentioned the need for overall control logic in the compression
|
252 |
|
|
and decompression libraries. In IJG implementations prior to v5, overall
|
253 |
|
|
control was mostly provided by "pipeline control" modules, which proved to be
|
254 |
|
|
large, unwieldy, and hard to understand. To improve the situation, the
|
255 |
|
|
control logic has been subdivided into multiple modules. The control modules
|
256 |
|
|
consist of:
|
257 |
|
|
|
258 |
|
|
1. Master control for module selection and initialization. This has two
|
259 |
|
|
responsibilities:
|
260 |
|
|
|
261 |
|
|
1A. Startup initialization at the beginning of image processing.
|
262 |
|
|
The individual processing modules to be used in this run are selected
|
263 |
|
|
and given initialization calls.
|
264 |
|
|
|
265 |
|
|
1B. Per-pass control. This determines how many passes will be performed
|
266 |
|
|
and calls each active processing module to configure itself
|
267 |
|
|
appropriately at the beginning of each pass. End-of-pass processing,
|
268 |
|
|
where necessary, is also invoked from the master control module.
|
269 |
|
|
|
270 |
|
|
Method selection is partially distributed, in that a particular processing
|
271 |
|
|
module may contain several possible implementations of a particular method,
|
272 |
|
|
which it will select among when given its initialization call. The master
|
273 |
|
|
control code need only be concerned with decisions that affect more than
|
274 |
|
|
one module.
|
275 |
|
|
|
276 |
|
|
2. Data buffering control. A separate control module exists for each
|
277 |
|
|
inter-processing-step data buffer. This module is responsible for
|
278 |
|
|
invoking the processing steps that write or read that data buffer.
|
279 |
|
|
|
280 |
|
|
Each buffer controller sees the world as follows:
|
281 |
|
|
|
282 |
|
|
input data => processing step A => buffer => processing step B => output data
|
283 |
|
|
| | |
|
284 |
|
|
------------------ controller ------------------
|
285 |
|
|
|
286 |
|
|
The controller knows the dataflow requirements of steps A and B: how much data
|
287 |
|
|
they want to accept in one chunk and how much they output in one chunk. Its
|
288 |
|
|
function is to manage its buffer and call A and B at the proper times.
|
289 |
|
|
|
290 |
|
|
A data buffer control module may itself be viewed as a processing step by a
|
291 |
|
|
higher-level control module; thus the control modules form a binary tree with
|
292 |
|
|
elementary processing steps at the leaves of the tree.
|
293 |
|
|
|
294 |
|
|
The control modules are objects. A considerable amount of flexibility can
|
295 |
|
|
be had by replacing implementations of a control module. For example:
|
296 |
|
|
* Merging of adjacent steps in the pipeline is done by replacing a control
|
297 |
|
|
module and its pair of processing-step modules with a single processing-
|
298 |
|
|
step module. (Hence the possible merges are determined by the tree of
|
299 |
|
|
control modules.)
|
300 |
|
|
* In some processing modes, a given interstep buffer need only be a "strip"
|
301 |
|
|
buffer large enough to accommodate the desired data chunk sizes. In other
|
302 |
|
|
modes, a full-image buffer is needed and several passes are required.
|
303 |
|
|
The control module determines which kind of buffer is used and manipulates
|
304 |
|
|
virtual array buffers as needed. One or both processing steps may be
|
305 |
|
|
unaware of the multi-pass behavior.
|
306 |
|
|
|
307 |
|
|
In theory, we might be able to make all of the data buffer controllers
|
308 |
|
|
interchangeable and provide just one set of implementations for all. In
|
309 |
|
|
practice, each one contains considerable special-case processing for its
|
310 |
|
|
particular job. The buffer controller concept should be regarded as an
|
311 |
|
|
overall system structuring principle, not as a complete description of the
|
312 |
|
|
task performed by any one controller.
|
313 |
|
|
|
314 |
|
|
|
315 |
|
|
*** Compression object structure ***
|
316 |
|
|
|
317 |
|
|
Here is a sketch of the logical structure of the JPEG compression library:
|
318 |
|
|
|
319 |
|
|
|-- Colorspace conversion
|
320 |
|
|
|-- Preprocessing controller --|
|
321 |
|
|
| |-- Downsampling
|
322 |
|
|
Main controller --|
|
323 |
|
|
| |-- Forward DCT, quantize
|
324 |
|
|
|-- Coefficient controller --|
|
325 |
|
|
|-- Entropy encoding
|
326 |
|
|
|
327 |
|
|
This sketch also describes the flow of control (subroutine calls) during
|
328 |
|
|
typical image data processing. Each of the components shown in the diagram is
|
329 |
|
|
an "object" which may have several different implementations available. One
|
330 |
|
|
or more source code files contain the actual implementation(s) of each object.
|
331 |
|
|
|
332 |
|
|
The objects shown above are:
|
333 |
|
|
|
334 |
|
|
* Main controller: buffer controller for the subsampled-data buffer, which
|
335 |
|
|
holds the preprocessed input data. This controller invokes preprocessing to
|
336 |
|
|
fill the subsampled-data buffer, and JPEG compression to empty it. There is
|
337 |
|
|
usually no need for a full-image buffer here; a strip buffer is adequate.
|
338 |
|
|
|
339 |
|
|
* Preprocessing controller: buffer controller for the downsampling input data
|
340 |
|
|
buffer, which lies between colorspace conversion and downsampling. Note
|
341 |
|
|
that a unified conversion/downsampling module would probably replace this
|
342 |
|
|
controller entirely.
|
343 |
|
|
|
344 |
|
|
* Colorspace conversion: converts application image data into the desired
|
345 |
|
|
JPEG color space; also changes the data from pixel-interleaved layout to
|
346 |
|
|
separate component planes. Processes one pixel row at a time.
|
347 |
|
|
|
348 |
|
|
* Downsampling: performs reduction of chroma components as required.
|
349 |
|
|
Optionally may perform pixel-level smoothing as well. Processes a "row
|
350 |
|
|
group" at a time, where a row group is defined as Vmax pixel rows of each
|
351 |
|
|
component before downsampling, and Vk sample rows afterwards (remember Vk
|
352 |
|
|
differs across components). Some downsampling or smoothing algorithms may
|
353 |
|
|
require context rows above and below the current row group; the
|
354 |
|
|
preprocessing controller is responsible for supplying these rows via proper
|
355 |
|
|
buffering. The downsampler is responsible for edge expansion at the right
|
356 |
|
|
edge (i.e., extending each sample row to a multiple of 8 samples); but the
|
357 |
|
|
preprocessing controller is responsible for vertical edge expansion (i.e.,
|
358 |
|
|
duplicating the bottom sample row as needed to make a multiple of 8 rows).
|
359 |
|
|
|
360 |
|
|
* Coefficient controller: buffer controller for the DCT-coefficient data.
|
361 |
|
|
This controller handles MCU assembly, including insertion of dummy DCT
|
362 |
|
|
blocks when needed at the right or bottom edge. When performing
|
363 |
|
|
Huffman-code optimization or emitting a multiscan JPEG file, this
|
364 |
|
|
controller is responsible for buffering the full image. The equivalent of
|
365 |
|
|
one fully interleaved MCU row of subsampled data is processed per call,
|
366 |
|
|
even when the JPEG file is noninterleaved.
|
367 |
|
|
|
368 |
|
|
* Forward DCT and quantization: Perform DCT, quantize, and emit coefficients.
|
369 |
|
|
Works on one or more DCT blocks at a time. (Note: the coefficients are now
|
370 |
|
|
emitted in normal array order, which the entropy encoder is expected to
|
371 |
|
|
convert to zigzag order as necessary. Prior versions of the IJG code did
|
372 |
|
|
the conversion to zigzag order within the quantization step.)
|
373 |
|
|
|
374 |
|
|
* Entropy encoding: Perform Huffman or arithmetic entropy coding and emit the
|
375 |
|
|
coded data to the data destination module. Works on one MCU per call.
|
376 |
|
|
For progressive JPEG, the same DCT blocks are fed to the entropy coder
|
377 |
|
|
during each pass, and the coder must emit the appropriate subset of
|
378 |
|
|
coefficients.
|
379 |
|
|
|
380 |
|
|
In addition to the above objects, the compression library includes these
|
381 |
|
|
objects:
|
382 |
|
|
|
383 |
|
|
* Master control: determines the number of passes required, controls overall
|
384 |
|
|
and per-pass initialization of the other modules.
|
385 |
|
|
|
386 |
|
|
* Marker writing: generates JPEG markers (except for RSTn, which is emitted
|
387 |
|
|
by the entropy encoder when needed).
|
388 |
|
|
|
389 |
|
|
* Data destination manager: writes the output JPEG datastream to its final
|
390 |
|
|
destination (e.g., a file). The destination manager supplied with the
|
391 |
|
|
library knows how to write to a stdio stream; for other behaviors, the
|
392 |
|
|
surrounding application may provide its own destination manager.
|
393 |
|
|
|
394 |
|
|
* Memory manager: allocates and releases memory, controls virtual arrays
|
395 |
|
|
(with backing store management, where required).
|
396 |
|
|
|
397 |
|
|
* Error handler: performs formatting and output of error and trace messages;
|
398 |
|
|
determines handling of nonfatal errors. The surrounding application may
|
399 |
|
|
override some or all of this object's methods to change error handling.
|
400 |
|
|
|
401 |
|
|
* Progress monitor: supports output of "percent-done" progress reports.
|
402 |
|
|
This object represents an optional callback to the surrounding application:
|
403 |
|
|
if wanted, it must be supplied by the application.
|
404 |
|
|
|
405 |
|
|
The error handler, destination manager, and progress monitor objects are
|
406 |
|
|
defined as separate objects in order to simplify application-specific
|
407 |
|
|
customization of the JPEG library. A surrounding application may override
|
408 |
|
|
individual methods or supply its own all-new implementation of one of these
|
409 |
|
|
objects. The object interfaces for these objects are therefore treated as
|
410 |
|
|
part of the application interface of the library, whereas the other objects
|
411 |
|
|
are internal to the library.
|
412 |
|
|
|
413 |
|
|
The error handler and memory manager are shared by JPEG compression and
|
414 |
|
|
decompression; the progress monitor, if used, may be shared as well.
|
415 |
|
|
|
416 |
|
|
|
417 |
|
|
*** Decompression object structure ***
|
418 |
|
|
|
419 |
|
|
Here is a sketch of the logical structure of the JPEG decompression library:
|
420 |
|
|
|
421 |
|
|
|-- Entropy decoding
|
422 |
|
|
|-- Coefficient controller --|
|
423 |
|
|
| |-- Dequantize, Inverse DCT
|
424 |
|
|
Main controller --|
|
425 |
|
|
| |-- Upsampling
|
426 |
|
|
|-- Postprocessing controller --| |-- Colorspace conversion
|
427 |
|
|
|-- Color quantization
|
428 |
|
|
|-- Color precision reduction
|
429 |
|
|
|
430 |
|
|
As before, this diagram also represents typical control flow. The objects
|
431 |
|
|
shown are:
|
432 |
|
|
|
433 |
|
|
* Main controller: buffer controller for the subsampled-data buffer, which
|
434 |
|
|
holds the output of JPEG decompression proper. This controller's primary
|
435 |
|
|
task is to feed the postprocessing procedure. Some upsampling algorithms
|
436 |
|
|
may require context rows above and below the current row group; when this
|
437 |
|
|
is true, the main controller is responsible for managing its buffer so as
|
438 |
|
|
to make context rows available. In the current design, the main buffer is
|
439 |
|
|
always a strip buffer; a full-image buffer is never required.
|
440 |
|
|
|
441 |
|
|
* Coefficient controller: buffer controller for the DCT-coefficient data.
|
442 |
|
|
This controller handles MCU disassembly, including deletion of any dummy
|
443 |
|
|
DCT blocks at the right or bottom edge. When reading a multiscan JPEG
|
444 |
|
|
file, this controller is responsible for buffering the full image.
|
445 |
|
|
(Buffering DCT coefficients, rather than samples, is necessary to support
|
446 |
|
|
progressive JPEG.) The equivalent of one fully interleaved MCU row of
|
447 |
|
|
subsampled data is processed per call, even when the source JPEG file is
|
448 |
|
|
noninterleaved.
|
449 |
|
|
|
450 |
|
|
* Entropy decoding: Read coded data from the data source module and perform
|
451 |
|
|
Huffman or arithmetic entropy decoding. Works on one MCU per call.
|
452 |
|
|
For progressive JPEG decoding, the coefficient controller supplies the prior
|
453 |
|
|
coefficients of each MCU (initially all zeroes), which the entropy decoder
|
454 |
|
|
modifies in each scan.
|
455 |
|
|
|
456 |
|
|
* Dequantization and inverse DCT: like it says. Note that the coefficients
|
457 |
|
|
buffered by the coefficient controller have NOT been dequantized; we
|
458 |
|
|
merge dequantization and inverse DCT into a single step for speed reasons.
|
459 |
|
|
When scaled-down output is asked for, simplified DCT algorithms may be used
|
460 |
|
|
that emit only 1x1, 2x2, or 4x4 samples per DCT block, not the full 8x8.
|
461 |
|
|
Works on one DCT block at a time.
|
462 |
|
|
|
463 |
|
|
* Postprocessing controller: buffer controller for the color quantization
|
464 |
|
|
input buffer, when quantization is in use. (Without quantization, this
|
465 |
|
|
controller just calls the upsampler.) For two-pass quantization, this
|
466 |
|
|
controller is responsible for buffering the full-image data.
|
467 |
|
|
|
468 |
|
|
* Upsampling: restores chroma components to full size. (May support more
|
469 |
|
|
general output rescaling, too. Note that if undersized DCT outputs have
|
470 |
|
|
been emitted by the DCT module, this module must adjust so that properly
|
471 |
|
|
sized outputs are created.) Works on one row group at a time. This module
|
472 |
|
|
also calls the color conversion module, so its top level is effectively a
|
473 |
|
|
buffer controller for the upsampling->color conversion buffer. However, in
|
474 |
|
|
all but the highest-quality operating modes, upsampling and color
|
475 |
|
|
conversion are likely to be merged into a single step.
|
476 |
|
|
|
477 |
|
|
* Colorspace conversion: convert from JPEG color space to output color space,
|
478 |
|
|
and change data layout from separate component planes to pixel-interleaved.
|
479 |
|
|
Works on one pixel row at a time.
|
480 |
|
|
|
481 |
|
|
* Color quantization: reduce the data to colormapped form, using either an
|
482 |
|
|
externally specified colormap or an internally generated one. This module
|
483 |
|
|
is not used for full-color output. Works on one pixel row at a time; may
|
484 |
|
|
require two passes to generate a color map. Note that the output will
|
485 |
|
|
always be a single component representing colormap indexes. In the current
|
486 |
|
|
design, the output values are JSAMPLEs, so an 8-bit compilation cannot
|
487 |
|
|
quantize to more than 256 colors. This is unlikely to be a problem in
|
488 |
|
|
practice.
|
489 |
|
|
|
490 |
|
|
* Color reduction: this module handles color precision reduction, e.g.,
|
491 |
|
|
generating 15-bit color (5 bits/primary) from JPEG's 24-bit output.
|
492 |
|
|
Not quite clear yet how this should be handled... should we merge it with
|
493 |
|
|
colorspace conversion???
|
494 |
|
|
|
495 |
|
|
Note that some high-speed operating modes might condense the entire
|
496 |
|
|
postprocessing sequence to a single module (upsample, color convert, and
|
497 |
|
|
quantize in one step).
|
498 |
|
|
|
499 |
|
|
In addition to the above objects, the decompression library includes these
|
500 |
|
|
objects:
|
501 |
|
|
|
502 |
|
|
* Master control: determines the number of passes required, controls overall
|
503 |
|
|
and per-pass initialization of the other modules. This is subdivided into
|
504 |
|
|
input and output control: jdinput.c controls only input-side processing,
|
505 |
|
|
while jdmaster.c handles overall initialization and output-side control.
|
506 |
|
|
|
507 |
|
|
* Marker reading: decodes JPEG markers (except for RSTn).
|
508 |
|
|
|
509 |
|
|
* Data source manager: supplies the input JPEG datastream. The source
|
510 |
|
|
manager supplied with the library knows how to read from a stdio stream;
|
511 |
|
|
for other behaviors, the surrounding application may provide its own source
|
512 |
|
|
manager.
|
513 |
|
|
|
514 |
|
|
* Memory manager: same as for compression library.
|
515 |
|
|
|
516 |
|
|
* Error handler: same as for compression library.
|
517 |
|
|
|
518 |
|
|
* Progress monitor: same as for compression library.
|
519 |
|
|
|
520 |
|
|
As with compression, the data source manager, error handler, and progress
|
521 |
|
|
monitor are candidates for replacement by a surrounding application.
|
522 |
|
|
|
523 |
|
|
|
524 |
|
|
*** Decompression input and output separation ***
|
525 |
|
|
|
526 |
|
|
To support efficient incremental display of progressive JPEG files, the
|
527 |
|
|
decompressor is divided into two sections that can run independently:
|
528 |
|
|
|
529 |
|
|
1. Data input includes marker parsing, entropy decoding, and input into the
|
530 |
|
|
coefficient controller's DCT coefficient buffer. Note that this
|
531 |
|
|
processing is relatively cheap and fast.
|
532 |
|
|
|
533 |
|
|
2. Data output reads from the DCT coefficient buffer and performs the IDCT
|
534 |
|
|
and all postprocessing steps.
|
535 |
|
|
|
536 |
|
|
For a progressive JPEG file, the data input processing is allowed to get
|
537 |
|
|
arbitrarily far ahead of the data output processing. (This occurs only
|
538 |
|
|
if the application calls jpeg_consume_input(); otherwise input and output
|
539 |
|
|
run in lockstep, since the input section is called only when the output
|
540 |
|
|
section needs more data.) In this way the application can avoid making
|
541 |
|
|
extra display passes when data is arriving faster than the display pass
|
542 |
|
|
can run. Furthermore, it is possible to abort an output pass without
|
543 |
|
|
losing anything, since the coefficient buffer is read-only as far as the
|
544 |
|
|
output section is concerned. See libjpeg.doc for more detail.
|
545 |
|
|
|
546 |
|
|
A full-image coefficient array is only created if the JPEG file has multiple
|
547 |
|
|
scans (or if the application specifies buffered-image mode anyway). When
|
548 |
|
|
reading a single-scan file, the coefficient controller normally creates only
|
549 |
|
|
a one-MCU buffer, so input and output processing must run in lockstep in this
|
550 |
|
|
case. jpeg_consume_input() is effectively a no-op in this situation.
|
551 |
|
|
|
552 |
|
|
The main impact of dividing the decompressor in this fashion is that we must
|
553 |
|
|
be very careful with shared variables in the cinfo data structure. Each
|
554 |
|
|
variable that can change during the course of decompression must be
|
555 |
|
|
classified as belonging to data input or data output, and each section must
|
556 |
|
|
look only at its own variables. For example, the data output section may not
|
557 |
|
|
depend on any of the variables that describe the current scan in the JPEG
|
558 |
|
|
file, because these may change as the data input section advances into a new
|
559 |
|
|
scan.
|
560 |
|
|
|
561 |
|
|
The progress monitor is (somewhat arbitrarily) defined to treat input of the
|
562 |
|
|
file as one pass when buffered-image mode is not used, and to ignore data
|
563 |
|
|
input work completely when buffered-image mode is used. Note that the
|
564 |
|
|
library has no reliable way to predict the number of passes when dealing
|
565 |
|
|
with a progressive JPEG file, nor can it predict the number of output passes
|
566 |
|
|
in buffered-image mode. So the work estimate is inherently bogus anyway.
|
567 |
|
|
|
568 |
|
|
No comparable division is currently made in the compression library, because
|
569 |
|
|
there isn't any real need for it.
|
570 |
|
|
|
571 |
|
|
|
572 |
|
|
*** Data formats ***
|
573 |
|
|
|
574 |
|
|
Arrays of pixel sample values use the following data structure:
|
575 |
|
|
|
576 |
|
|
typedef something JSAMPLE; a pixel component value, 0..MAXJSAMPLE
|
577 |
|
|
typedef JSAMPLE *JSAMPROW; ptr to a row of samples
|
578 |
|
|
typedef JSAMPROW *JSAMPARRAY; ptr to a list of rows
|
579 |
|
|
typedef JSAMPARRAY *JSAMPIMAGE; ptr to a list of color-component arrays
|
580 |
|
|
|
581 |
|
|
The basic element type JSAMPLE will typically be one of unsigned char,
|
582 |
|
|
(signed) char, or short. Short will be used if samples wider than 8 bits are
|
583 |
|
|
to be supported (this is a compile-time option). Otherwise, unsigned char is
|
584 |
|
|
used if possible. If the compiler only supports signed chars, then it is
|
585 |
|
|
necessary to mask off the value when reading. Thus, all reads of JSAMPLE
|
586 |
|
|
values must be coded as "GETJSAMPLE(value)", where the macro will be defined
|
587 |
|
|
as "((value) & 0xFF)" on signed-char machines and "((int) (value))" elsewhere.
|
588 |
|
|
|
589 |
|
|
With these conventions, JSAMPLE values can be assumed to be >= 0. This helps
|
590 |
|
|
simplify correct rounding during downsampling, etc. The JPEG standard's
|
591 |
|
|
specification that sample values run from -128..127 is accommodated by
|
592 |
|
|
subtracting 128 just as the sample value is copied into the source array for
|
593 |
|
|
the DCT step (this will be an array of signed ints). Similarly, during
|
594 |
|
|
decompression the output of the IDCT step will be immediately shifted back to
|
595 |
|
|
0..255. (NB: different values are required when 12-bit samples are in use.
|
596 |
|
|
The code is written in terms of MAXJSAMPLE and CENTERJSAMPLE, which will be
|
597 |
|
|
defined as 255 and 128 respectively in an 8-bit implementation, and as 4095
|
598 |
|
|
and 2048 in a 12-bit implementation.)
|
599 |
|
|
|
600 |
|
|
We use a pointer per row, rather than a two-dimensional JSAMPLE array. This
|
601 |
|
|
choice costs only a small amount of memory and has several benefits:
|
602 |
|
|
* Code using the data structure doesn't need to know the allocated width of
|
603 |
|
|
the rows. This simplifies edge expansion/compression, since we can work
|
604 |
|
|
in an array that's wider than the logical picture width.
|
605 |
|
|
* Indexing doesn't require multiplication; this is a performance win on many
|
606 |
|
|
machines.
|
607 |
|
|
* Arrays with more than 64K total elements can be supported even on machines
|
608 |
|
|
where malloc() cannot allocate chunks larger than 64K.
|
609 |
|
|
* The rows forming a component array may be allocated at different times
|
610 |
|
|
without extra copying. This trick allows some speedups in smoothing steps
|
611 |
|
|
that need access to the previous and next rows.
|
612 |
|
|
|
613 |
|
|
Note that each color component is stored in a separate array; we don't use the
|
614 |
|
|
traditional layout in which the components of a pixel are stored together.
|
615 |
|
|
This simplifies coding of modules that work on each component independently,
|
616 |
|
|
because they don't need to know how many components there are. Furthermore,
|
617 |
|
|
we can read or write each component to a temporary file independently, which
|
618 |
|
|
is helpful when dealing with noninterleaved JPEG files.
|
619 |
|
|
|
620 |
|
|
In general, a specific sample value is accessed by code such as
|
621 |
|
|
GETJSAMPLE(image[colorcomponent][row][col])
|
622 |
|
|
where col is measured from the image left edge, but row is measured from the
|
623 |
|
|
first sample row currently in memory. Either of the first two indexings can
|
624 |
|
|
be precomputed by copying the relevant pointer.
|
625 |
|
|
|
626 |
|
|
|
627 |
|
|
Since most image-processing applications prefer to work on images in which
|
628 |
|
|
the components of a pixel are stored together, the data passed to or from the
|
629 |
|
|
surrounding application uses the traditional convention: a single pixel is
|
630 |
|
|
represented by N consecutive JSAMPLE values, and an image row is an array of
|
631 |
|
|
(# of color components)*(image width) JSAMPLEs. One or more rows of data can
|
632 |
|
|
be represented by a pointer of type JSAMPARRAY in this scheme. This scheme is
|
633 |
|
|
converted to component-wise storage inside the JPEG library. (Applications
|
634 |
|
|
that want to skip JPEG preprocessing or postprocessing will have to contend
|
635 |
|
|
with component-wise storage.)
|
636 |
|
|
|
637 |
|
|
|
638 |
|
|
Arrays of DCT-coefficient values use the following data structure:
|
639 |
|
|
|
640 |
|
|
typedef short JCOEF; a 16-bit signed integer
|
641 |
|
|
typedef JCOEF JBLOCK[DCTSIZE2]; an 8x8 block of coefficients
|
642 |
|
|
typedef JBLOCK *JBLOCKROW; ptr to one horizontal row of 8x8 blocks
|
643 |
|
|
typedef JBLOCKROW *JBLOCKARRAY; ptr to a list of such rows
|
644 |
|
|
typedef JBLOCKARRAY *JBLOCKIMAGE; ptr to a list of color component arrays
|
645 |
|
|
|
646 |
|
|
The underlying type is at least a 16-bit signed integer; while "short" is big
|
647 |
|
|
enough on all machines of interest, on some machines it is preferable to use
|
648 |
|
|
"int" for speed reasons, despite the storage cost. Coefficients are grouped
|
649 |
|
|
into 8x8 blocks (but we always use #defines DCTSIZE and DCTSIZE2 rather than
|
650 |
|
|
"8" and "64").
|
651 |
|
|
|
652 |
|
|
The contents of a coefficient block may be in either "natural" or zigzagged
|
653 |
|
|
order, and may be true values or divided by the quantization coefficients,
|
654 |
|
|
depending on where the block is in the processing pipeline. In the current
|
655 |
|
|
library, coefficient blocks are kept in natural order everywhere; the entropy
|
656 |
|
|
codecs zigzag or dezigzag the data as it is written or read. The blocks
|
657 |
|
|
contain quantized coefficients everywhere outside the DCT/IDCT subsystems.
|
658 |
|
|
(This latter decision may need to be revisited to support variable
|
659 |
|
|
quantization a la JPEG Part 3.)
|
660 |
|
|
|
661 |
|
|
Notice that the allocation unit is now a row of 8x8 blocks, corresponding to
|
662 |
|
|
eight rows of samples. Otherwise the structure is much the same as for
|
663 |
|
|
samples, and for the same reasons.
|
664 |
|
|
|
665 |
|
|
On machines where malloc() can't handle a request bigger than 64Kb, this data
|
666 |
|
|
structure limits us to rows of less than 512 JBLOCKs, or a picture width of
|
667 |
|
|
4000+ pixels. This seems an acceptable restriction.
|
668 |
|
|
|
669 |
|
|
|
670 |
|
|
On 80x86 machines, the bottom-level pointer types (JSAMPROW and JBLOCKROW)
|
671 |
|
|
must be declared as "far" pointers, but the upper levels can be "near"
|
672 |
|
|
(implying that the pointer lists are allocated in the DS segment).
|
673 |
|
|
We use a #define symbol FAR, which expands to the "far" keyword when
|
674 |
|
|
compiling on 80x86 machines and to nothing elsewhere.
|
675 |
|
|
|
676 |
|
|
|
677 |
|
|
*** Suspendable processing ***
|
678 |
|
|
|
679 |
|
|
In some applications it is desirable to use the JPEG library as an
|
680 |
|
|
incremental, memory-to-memory filter. In this situation the data source or
|
681 |
|
|
destination may be a limited-size buffer, and we can't rely on being able to
|
682 |
|
|
empty or refill the buffer at arbitrary times. Instead the application would
|
683 |
|
|
like to have control return from the library at buffer overflow/underrun, and
|
684 |
|
|
then resume compression or decompression at a later time.
|
685 |
|
|
|
686 |
|
|
This scenario is supported for simple cases. (For anything more complex, we
|
687 |
|
|
recommend that the application "bite the bullet" and develop real multitasking
|
688 |
|
|
capability.) The libjpeg.doc file goes into more detail about the usage and
|
689 |
|
|
limitations of this capability; here we address the implications for library
|
690 |
|
|
structure.
|
691 |
|
|
|
692 |
|
|
The essence of the problem is that the entropy codec (coder or decoder) must
|
693 |
|
|
be prepared to stop at arbitrary times. In turn, the controllers that call
|
694 |
|
|
the entropy codec must be able to stop before having produced or consumed all
|
695 |
|
|
the data that they normally would handle in one call. That part is reasonably
|
696 |
|
|
straightforward: we make the controller call interfaces include "progress
|
697 |
|
|
counters" which indicate the number of data chunks successfully processed, and
|
698 |
|
|
we require callers to test the counter rather than just assume all of the data
|
699 |
|
|
was processed.
|
700 |
|
|
|
701 |
|
|
Rather than trying to restart at an arbitrary point, the current Huffman
|
702 |
|
|
codecs are designed to restart at the beginning of the current MCU after a
|
703 |
|
|
suspension due to buffer overflow/underrun. At the start of each call, the
|
704 |
|
|
codec's internal state is loaded from permanent storage (in the JPEG object
|
705 |
|
|
structures) into local variables. On successful completion of the MCU, the
|
706 |
|
|
permanent state is updated. (This copying is not very expensive, and may even
|
707 |
|
|
lead to *improved* performance if the local variables can be registerized.)
|
708 |
|
|
If a suspension occurs, the codec simply returns without updating the state,
|
709 |
|
|
thus effectively reverting to the start of the MCU. Note that this implies
|
710 |
|
|
leaving some data unprocessed in the source/destination buffer (ie, the
|
711 |
|
|
compressed partial MCU). The data source/destination module interfaces are
|
712 |
|
|
specified so as to make this possible. This also implies that the data buffer
|
713 |
|
|
must be large enough to hold a worst-case compressed MCU; a couple thousand
|
714 |
|
|
bytes should be enough.
|
715 |
|
|
|
716 |
|
|
In a successive-approximation AC refinement scan, the progressive Huffman
|
717 |
|
|
decoder has to be able to undo assignments of newly nonzero coefficients if it
|
718 |
|
|
suspends before the MCU is complete, since decoding requires distinguishing
|
719 |
|
|
previously-zero and previously-nonzero coefficients. This is a bit tedious
|
720 |
|
|
but probably won't have much effect on performance. Other variants of Huffman
|
721 |
|
|
decoding need not worry about this, since they will just store the same values
|
722 |
|
|
again if forced to repeat the MCU.
|
723 |
|
|
|
724 |
|
|
This approach would probably not work for an arithmetic codec, since its
|
725 |
|
|
modifiable state is quite large and couldn't be copied cheaply. Instead it
|
726 |
|
|
would have to suspend and resume exactly at the point of the buffer end.
|
727 |
|
|
|
728 |
|
|
The JPEG marker reader is designed to cope with suspension at an arbitrary
|
729 |
|
|
point. It does so by backing up to the start of the marker parameter segment,
|
730 |
|
|
so the data buffer must be big enough to hold the largest marker of interest.
|
731 |
|
|
Again, a couple KB should be adequate. (A special "skip" convention is used
|
732 |
|
|
to bypass COM and APPn markers, so these can be larger than the buffer size
|
733 |
|
|
without causing problems; otherwise a 64K buffer would be needed in the worst
|
734 |
|
|
case.)
|
735 |
|
|
|
736 |
|
|
The JPEG marker writer currently does *not* cope with suspension. I feel that
|
737 |
|
|
this is not necessary; it is much easier simply to require the application to
|
738 |
|
|
ensure there is enough buffer space before starting. (An empty 2K buffer is
|
739 |
|
|
more than sufficient for the header markers; and ensuring there are a dozen or
|
740 |
|
|
two bytes available before calling jpeg_finish_compress() will suffice for the
|
741 |
|
|
trailer.) This would not work for writing multi-scan JPEG files, but
|
742 |
|
|
we simply do not intend to support that capability with suspension.
|
743 |
|
|
|
744 |
|
|
|
745 |
|
|
*** Memory manager services ***
|
746 |
|
|
|
747 |
|
|
The JPEG library's memory manager controls allocation and deallocation of
|
748 |
|
|
memory, and it manages large "virtual" data arrays on machines where the
|
749 |
|
|
operating system does not provide virtual memory. Note that the same
|
750 |
|
|
memory manager serves both compression and decompression operations.
|
751 |
|
|
|
752 |
|
|
In all cases, allocated objects are tied to a particular compression or
|
753 |
|
|
decompression master record, and they will be released when that master
|
754 |
|
|
record is destroyed.
|
755 |
|
|
|
756 |
|
|
The memory manager does not provide explicit deallocation of objects.
|
757 |
|
|
Instead, objects are created in "pools" of free storage, and a whole pool
|
758 |
|
|
can be freed at once. This approach helps prevent storage-leak bugs, and
|
759 |
|
|
it speeds up operations whenever malloc/free are slow (as they often are).
|
760 |
|
|
The pools can be regarded as lifetime identifiers for objects. Two
|
761 |
|
|
pools/lifetimes are defined:
|
762 |
|
|
* JPOOL_PERMANENT lasts until master record is destroyed
|
763 |
|
|
* JPOOL_IMAGE lasts until done with image (JPEG datastream)
|
764 |
|
|
Permanent lifetime is used for parameters and tables that should be carried
|
765 |
|
|
across from one datastream to another; this includes all application-visible
|
766 |
|
|
parameters. Image lifetime is used for everything else. (A third lifetime,
|
767 |
|
|
JPOOL_PASS = one processing pass, was originally planned. However it was
|
768 |
|
|
dropped as not being worthwhile. The actual usage patterns are such that the
|
769 |
|
|
peak memory usage would be about the same anyway; and having per-pass storage
|
770 |
|
|
substantially complicates the virtual memory allocation rules --- see below.)
|
771 |
|
|
|
772 |
|
|
The memory manager deals with three kinds of object:
|
773 |
|
|
1. "Small" objects. Typically these require no more than 10K-20K total.
|
774 |
|
|
2. "Large" objects. These may require tens to hundreds of K depending on
|
775 |
|
|
image size. Semantically they behave the same as small objects, but we
|
776 |
|
|
distinguish them for two reasons:
|
777 |
|
|
* On MS-DOS machines, large objects are referenced by FAR pointers,
|
778 |
|
|
small objects by NEAR pointers.
|
779 |
|
|
* Pool allocation heuristics may differ for large and small objects.
|
780 |
|
|
Note that individual "large" objects cannot exceed the size allowed by
|
781 |
|
|
type size_t, which may be 64K or less on some machines.
|
782 |
|
|
3. "Virtual" objects. These are large 2-D arrays of JSAMPLEs or JBLOCKs
|
783 |
|
|
(typically large enough for the entire image being processed). The
|
784 |
|
|
memory manager provides stripwise access to these arrays. On machines
|
785 |
|
|
without virtual memory, the rest of the array may be swapped out to a
|
786 |
|
|
temporary file.
|
787 |
|
|
|
788 |
|
|
(Note: JSAMPARRAY and JBLOCKARRAY data structures are a combination of large
|
789 |
|
|
objects for the data proper and small objects for the row pointers. For
|
790 |
|
|
convenience and speed, the memory manager provides single routines to create
|
791 |
|
|
these structures. Similarly, virtual arrays include a small control block
|
792 |
|
|
and a JSAMPARRAY or JBLOCKARRAY working buffer, all created with one call.)
|
793 |
|
|
|
794 |
|
|
In the present implementation, virtual arrays are only permitted to have image
|
795 |
|
|
lifespan. (Permanent lifespan would not be reasonable, and pass lifespan is
|
796 |
|
|
not very useful since a virtual array's raison d'etre is to store data for
|
797 |
|
|
multiple passes through the image.) We also expect that only "small" objects
|
798 |
|
|
will be given permanent lifespan, though this restriction is not required by
|
799 |
|
|
the memory manager.
|
800 |
|
|
|
801 |
|
|
In a non-virtual-memory machine, some performance benefit can be gained by
|
802 |
|
|
making the in-memory buffers for virtual arrays be as large as possible.
|
803 |
|
|
(For small images, the buffers might fit entirely in memory, so blind
|
804 |
|
|
swapping would be very wasteful.) The memory manager will adjust the height
|
805 |
|
|
of the buffers to fit within a prespecified maximum memory usage. In order
|
806 |
|
|
to do this in a reasonably optimal fashion, the manager needs to allocate all
|
807 |
|
|
of the virtual arrays at once. Therefore, there isn't a one-step allocation
|
808 |
|
|
routine for virtual arrays; instead, there is a "request" routine that simply
|
809 |
|
|
allocates the control block, and a "realize" routine (called just once) that
|
810 |
|
|
determines space allocation and creates all of the actual buffers. The
|
811 |
|
|
realize routine must allow for space occupied by non-virtual large objects.
|
812 |
|
|
(We don't bother to factor in the space needed for small objects, on the
|
813 |
|
|
grounds that it isn't worth the trouble.)
|
814 |
|
|
|
815 |
|
|
To support all this, we establish the following protocol for doing business
|
816 |
|
|
with the memory manager:
|
817 |
|
|
1. Modules must request virtual arrays (which may have only image lifespan)
|
818 |
|
|
during the initial setup phase, i.e., in their jinit_xxx routines.
|
819 |
|
|
2. All "large" objects (including JSAMPARRAYs and JBLOCKARRAYs) must also be
|
820 |
|
|
allocated during initial setup.
|
821 |
|
|
3. realize_virt_arrays will be called at the completion of initial setup.
|
822 |
|
|
The above conventions ensure that sufficient information is available
|
823 |
|
|
for it to choose a good size for virtual array buffers.
|
824 |
|
|
Small objects of any lifespan may be allocated at any time. We expect that
|
825 |
|
|
the total space used for small objects will be small enough to be negligible
|
826 |
|
|
in the realize_virt_arrays computation.
|
827 |
|
|
|
828 |
|
|
In a virtual-memory machine, we simply pretend that the available space is
|
829 |
|
|
infinite, thus causing realize_virt_arrays to decide that it can allocate all
|
830 |
|
|
the virtual arrays as full-size in-memory buffers. The overhead of the
|
831 |
|
|
virtual-array access protocol is very small when no swapping occurs.
|
832 |
|
|
|
833 |
|
|
A virtual array can be specified to be "pre-zeroed"; when this flag is set,
|
834 |
|
|
never-yet-written sections of the array are set to zero before being made
|
835 |
|
|
available to the caller. If this flag is not set, never-written sections
|
836 |
|
|
of the array contain garbage. (This feature exists primarily because the
|
837 |
|
|
equivalent logic would otherwise be needed in jdcoefct.c for progressive
|
838 |
|
|
JPEG mode; we may as well make it available for possible other uses.)
|
839 |
|
|
|
840 |
|
|
The first write pass on a virtual array is required to occur in top-to-bottom
|
841 |
|
|
order; read passes, as well as any write passes after the first one, may
|
842 |
|
|
access the array in any order. This restriction exists partly to simplify
|
843 |
|
|
the virtual array control logic, and partly because some file systems may not
|
844 |
|
|
support seeking beyond the current end-of-file in a temporary file. The main
|
845 |
|
|
implication of this restriction is that rearrangement of rows (such as
|
846 |
|
|
converting top-to-bottom data order to bottom-to-top) must be handled while
|
847 |
|
|
reading data out of the virtual array, not while putting it in.
|
848 |
|
|
|
849 |
|
|
|
850 |
|
|
*** Memory manager internal structure ***
|
851 |
|
|
|
852 |
|
|
To isolate system dependencies as much as possible, we have broken the
|
853 |
|
|
memory manager into two parts. There is a reasonably system-independent
|
854 |
|
|
"front end" (jmemmgr.c) and a "back end" that contains only the code
|
855 |
|
|
likely to change across systems. All of the memory management methods
|
856 |
|
|
outlined above are implemented by the front end. The back end provides
|
857 |
|
|
the following routines for use by the front end (none of these routines
|
858 |
|
|
are known to the rest of the JPEG code):
|
859 |
|
|
|
860 |
|
|
jpeg_mem_init, jpeg_mem_term system-dependent initialization/shutdown
|
861 |
|
|
|
862 |
|
|
jpeg_get_small, jpeg_free_small interface to malloc and free library routines
|
863 |
|
|
(or their equivalents)
|
864 |
|
|
|
865 |
|
|
jpeg_get_large, jpeg_free_large interface to FAR malloc/free in MSDOS machines;
|
866 |
|
|
else usually the same as
|
867 |
|
|
jpeg_get_small/jpeg_free_small
|
868 |
|
|
|
869 |
|
|
jpeg_mem_available estimate available memory
|
870 |
|
|
|
871 |
|
|
jpeg_open_backing_store create a backing-store object
|
872 |
|
|
|
873 |
|
|
read_backing_store, manipulate a backing-store object
|
874 |
|
|
write_backing_store,
|
875 |
|
|
close_backing_store
|
876 |
|
|
|
877 |
|
|
On some systems there will be more than one type of backing-store object
|
878 |
|
|
(specifically, in MS-DOS a backing store file might be an area of extended
|
879 |
|
|
memory as well as a disk file). jpeg_open_backing_store is responsible for
|
880 |
|
|
choosing how to implement a given object. The read/write/close routines
|
881 |
|
|
are method pointers in the structure that describes a given object; this
|
882 |
|
|
lets them be different for different object types.
|
883 |
|
|
|
884 |
|
|
It may be necessary to ensure that backing store objects are explicitly
|
885 |
|
|
released upon abnormal program termination. For example, MS-DOS won't free
|
886 |
|
|
extended memory by itself. To support this, we will expect the main program
|
887 |
|
|
or surrounding application to arrange to call self_destruct (typically via
|
888 |
|
|
jpeg_destroy) upon abnormal termination. This may require a SIGINT signal
|
889 |
|
|
handler or equivalent. We don't want to have the back end module install its
|
890 |
|
|
own signal handler, because that would pre-empt the surrounding application's
|
891 |
|
|
ability to control signal handling.
|
892 |
|
|
|
893 |
|
|
The IJG distribution includes several memory manager back end implementations.
|
894 |
|
|
Usually the same back end should be suitable for all applications on a given
|
895 |
|
|
system, but it is possible for an application to supply its own back end at
|
896 |
|
|
need.
|
897 |
|
|
|
898 |
|
|
|
899 |
|
|
*** Implications of DNL marker ***
|
900 |
|
|
|
901 |
|
|
Some JPEG files may use a DNL marker to postpone definition of the image
|
902 |
|
|
height (this would be useful for a fax-like scanner's output, for instance).
|
903 |
|
|
In these files the SOF marker claims the image height is 0, and you only
|
904 |
|
|
find out the true image height at the end of the first scan.
|
905 |
|
|
|
906 |
|
|
We could read these files as follows:
|
907 |
|
|
1. Upon seeing zero image height, replace it by 65535 (the maximum allowed).
|
908 |
|
|
2. When the DNL is found, update the image height in the global image
|
909 |
|
|
descriptor.
|
910 |
|
|
This implies that control modules must avoid making copies of the image
|
911 |
|
|
height, and must re-test for termination after each MCU row. This would
|
912 |
|
|
be easy enough to do.
|
913 |
|
|
|
914 |
|
|
In cases where image-size data structures are allocated, this approach will
|
915 |
|
|
result in very inefficient use of virtual memory or much-larger-than-necessary
|
916 |
|
|
temporary files. This seems acceptable for something that probably won't be a
|
917 |
|
|
mainstream usage. People might have to forgo use of memory-hogging options
|
918 |
|
|
(such as two-pass color quantization or noninterleaved JPEG files) if they
|
919 |
|
|
want efficient conversion of such files. (One could improve efficiency by
|
920 |
|
|
demanding a user-supplied upper bound for the height, less than 65536; in most
|
921 |
|
|
cases it could be much less.)
|
922 |
|
|
|
923 |
|
|
The standard also permits the SOF marker to overestimate the image height,
|
924 |
|
|
with a DNL to give the true, smaller height at the end of the first scan.
|
925 |
|
|
This would solve the space problems if the overestimate wasn't too great.
|
926 |
|
|
However, it implies that you don't even know whether DNL will be used.
|
927 |
|
|
|
928 |
|
|
This leads to a couple of very serious objections:
|
929 |
|
|
1. Testing for a DNL marker must occur in the inner loop of the decompressor's
|
930 |
|
|
Huffman decoder; this implies a speed penalty whether the feature is used
|
931 |
|
|
or not.
|
932 |
|
|
2. There is no way to hide the last-minute change in image height from an
|
933 |
|
|
application using the decoder. Thus *every* application using the IJG
|
934 |
|
|
library would suffer a complexity penalty whether it cared about DNL or
|
935 |
|
|
not.
|
936 |
|
|
We currently do not support DNL because of these problems.
|
937 |
|
|
|
938 |
|
|
A different approach is to insist that DNL-using files be preprocessed by a
|
939 |
|
|
separate program that reads ahead to the DNL, then goes back and fixes the SOF
|
940 |
|
|
marker. This is a much simpler solution and is probably far more efficient.
|
941 |
|
|
Even if one wants piped input, buffering the first scan of the JPEG file needs
|
942 |
|
|
a lot smaller temp file than is implied by the maximum-height method. For
|
943 |
|
|
this approach we'd simply treat DNL as a no-op in the decompressor (at most,
|
944 |
|
|
check that it matches the SOF image height).
|
945 |
|
|
|
946 |
|
|
We will not worry about making the compressor capable of outputting DNL.
|
947 |
|
|
Something similar to the first scheme above could be applied if anyone ever
|
948 |
|
|
wants to make that work.
|