This paper is a revid version of an article by the same title and author which appeared in the April 1991 issue of Communications of the ACM.
Abstract
For the past few years, a joint ISO/CCITT committee known as JPEG (Joint Photographic Experts Group) has been working to establish the first international compression standard for continuous-tone still images, both grayscale and color. JPEG’s propod standard aims to be generic, to support a wide variety of applications for continuous-tone images. To meet the differing needs of many applications, the JPEG standard includes two basic compression methods, each with various modes of operation. A DCT-bad method is specified for “lossy’’ compression, and a predictive method for “lossless’’ compression. JPEG features a simple lossy technique known as the Baline method, a subt of the other DCT-bad modes of operation. The Baline method has been by far the most widely implemented JPEG method to date, and is sufficient in its own right for a large number of applications. This article provides an overview of the JPEG standard, and focus in detail on the Baline method.
1 Introduction
Advances over the past decade in many aspects of digital technology - especially devices for image acquisition, data storage, and bitmapped printing and display - have brought about many applications of digital imaging. However, the applications tend to be specialized due to their relatively high cost. With the possible exception of facsimile, digital images are not commonplace in general-purpo computing systems the way text and geometric graphics are. The majority of modern business and consumer usage of photographs and other types of images takes place through more traditional analog means.The key obstacle for many applications is the vast amount of data required to reprent a digital image directly. A digitized version of a single, color picture at TV resolution contains on the order of one million bytes; 35mm resolution requires ten times that amount. U of digital images often is not viable due to high storage or transmission costs, even when image capture and display devices are quite affordable.
Modern image compression technology offers a possible solution. State-of-the-art techniques can compress typical images from 1/10 to 1/50 their uncompresd size without visibly affecting image quality. But compression technology alone is not sufficient. For digital image applications involving storage or transmission to become widespread in today’s marketplace, a standard image compression method is needed to enable interoperability of equipment from different manufacturers.
The CCITT recommendation for today’s ubiquitous Group 3 fax machines [17] is a dramatic example of how a standard compression method can enable an important image application. The Group 3 method, however, deals with bilevel images only and does not address photographic image compression.
For the past few years, a standardization effort known by the acronym JPEG, for Joint Photographic Experts Group, has been working toward establishing the first international digital image compression standard for continuous-tone (multilevel) still images, both grayscale and color. The “joint” in JPEG refers to a collaboration between CCITT and ISO. JPEG convenes officially as the ISO committee designated JTC1/SC2/WG10, but operates in clo informal collaboration with CCITT SGVIII. JPEG will be both an ISO Standard and a CCITT Recommendation. The text of both will be identical.
三年级作文写植物Photovideotex, desktop publishing, graphic arts, color facsimile, newspaper wirephoto transmission, medical imaging, and many other continuous-tone image applications require a compression standard in order to
The JPEG Still Picture Compression Standard
Gregory K. Wallace
Multimedia Engineering
Digital Equipment Corporation
Maynard, Massachutts
Submitted in December 1991 for publication in IEEE Transactions on Consumer Electronics
develop significantly beyond their prent state. JPEG has undertaken the ambitious task of developing a
general-purpo compression standard to meet the needs of almost all continuous-tone still-image applications.
If this goal proves attainable, not only will individual applications flourish, but exchange of images across application boundaries will be facilitated. This latter feature will become increasingly important as more image applications are implemented on general-purpo computing systems, which are themlves becoming increasingly interoperable and internetworked. For applications whi
ch require specialized VLSI to meet their compression and decompression speed requirements, a common method will provide economies of scale not possible within a single application.
This article gives an overview of JPEG’s propod image-compression standard. Readers without prior knowledge of JPEG or compression bad on the Discrete Cosine Transform (DCT) are encouraged to study first the detailed description of the Baline quential codec, which is the basis for all of the DCT-bad decoders. While this article provides many details, many more are necessarily omitted. The reader should refer to the ISO draft standard [2] before attempting implementation.
Some of the earliest industry attention to the JPEG proposal has been focud on the Baline quential codec as a motion image compression method - of the ‘‘intraframe’’ class, where each frame is encoded as a parate image. This class of motion image coding, while providing less compression than ‘‘interframe’’methods like MPEG, has greater flexibility for video editing. While this paper focus only on JPEG as a still picture standard (as ISO intended), it is interesting to note that JPEG is likely to become a ‘‘de facto’’intraframe motion standard as well.
2 Background: Requirements and Selec-tion Process
JPEG’s goal has been to develop a method for continuous-tone image compression which meets the following requirements:
1)be at or near the state of the art with regard to
compression rate and accompanying image fidelity, over a wide range of image quality ratings, and especially in the range where visual fidelity to the original is characterized as “very good” to “excellent”; also, the encoder should be parameterizable, so that the application (or ur) can t the desired compression/quality tradeoff;2)be applicable to practically any kind of
continuous-tone digital source image (i.e. for most practical purpos not be restricted to images of certain dimensions, color spaces, pixel aspect ratios, etc.) and not be limited to class of imagery with restrictions on scene content, such as complexity, range of colors, or statistical properties;
3)have tractable computational complexity, to make
feasible software implementations with viable performance on a range of CPU’s, as well as hardware implementations with viable cost for applications requiring high performance;
4) have the following modes of operation:
•Sequential encoding: each image component is encoded in a single left-to-right, top-to-bottom
scan;
•Progressive encoding: the image is encoded in multiple scans for applications in which
transmission time is long, and the viewer
prefers to watch the image build up in multiple
coar-to-clear pass;
•Lossless encoding: the image is encoded to guarantee exact recovery of every source
image sample value (even though the result is
low compression compared to the lossy
modes);
•Hierarchical encoding: the image is encoded at multiple resolutions so that lower-resolution
versions may be accesd without first having
to decompress the image at its full resolution. In June 1987, JPEG conducted a lection process bad on a blind asssment of subjective picture quality, and narrowed 12 propod methods to three. Three informal working groups formed to refine them, and in January 1988, a cond, more rigorous lection process [19] revealed that the “ADCT” proposal [11], bad on the 8x8 DCT, had produced the best picture quality.
At the time of its lection, the DCT-bad method was only partially defined for some of the modes of operation. From 1988 through 1990, JPEG undertook the sizable task of defining, documenting, simulating, testing, validating, and simply agreeing on the plethora of details necessary for genuine interoperability and universality. Further history of the JPEG effort is contained in [6, 7, 9, 18].
3 Architecture of the Propod Standard The propod standard contains the four “modes of operation” identified previously. For each mode, one or more distinct codecs are specified. Codecs within a mode differ according to the precision of source image samples they can handle or the entropy coding method they u. Although the word codec (encoder/decoder) is ud frequently in this article, there is no requirement that implementations must include both an encoder and a decoder. Many applications will have systems or devices which require only one or the other.
The four modes of operation and their various codecs have resulted from JPEG’s goal of being generic and from the diversity of image formats across applications. The multiple pieces can give the impression of undesirable complexity, but they should actually be regarded as a comprehensive “toolkit” which can span a wide range of continuous-tone image applications. It is unlikely that many implementations will utilize every tool -- indeed, most of the early implementations now on the market (even before final ISO approval) have implemented only the Baline quential codec.
The Baline quential codec is inherently a rich and sophisticated compression method which will be sufficient for many applications. Getting this minimum JPEG capability implemented properly and interoperably will provide the industry with an important initial capability for exchange of images across vendors and applications.
4 Processing Steps for DCT-Bad Coding Figures 1 and 2 show the key processing steps which are the heart of the DCT-bad modes of operation. The figures illustrate the special ca of single-component (grayscale) image compression. The reader can grasp the esntials of DCT-bad compression by thinking of it as esntially compression of a stream of 8x8 blocks of grayscale image samples. Color image compression can then be approximately regarded as compression of multiple grayscale images, which are either compresd entirely one at a time, or ar
e compresd by alternately interleaving 8x8 sample blocks from each in turn.
For DCT quential-mode codecs, which include the Baline quential codec, the simplified diagrams indicate how single-component compression works in a fairly complete way. Each 8x8 block is input, makes its way through each processing step, and yields output in compresd form into the data stream. For DCT progressive-mode codecs, an image buffer exists prior to the entropy coding step, so that an image can be stored and then parceled out in multiple scans with suc-cessively improving quality. For the hierarchical mode of operation, the steps shown are ud as building blocks within a larger framework.
4.1 8x8 FDCT and IDCT
At the input to the encoder, source image samples are grouped into 8x8 blocks, shifted from unsigned integers with range [0, 2P - 1] to signed integers with range [-2P-1, 2P-1-1], and input to the Forward DCT (FDCT). At the output from the decoder, the Inver DCT (IDCT) outputs 8x8 sample blocks to form the reconstructed image. The following equations are the idealized mathematical definitions of the 8x8 FDCT and 8x8 IDCT:
The DCT is related to the Discrete Fourier Transform (DFT). Some simple intuition for DCT-bad c
ompression can be obtained by viewing the FDCT as a harmonic analyzer and the IDCT as a harmonic synthesizer. Each 8x8 block of source image samples is effectively a 64-point discrete signal which is a function of the two spatial dimensions x and y. The FDCT takes such a signal as its input and decompos it into 64 orthogonal basis signals. Each contains one of the 64 unique two-dimensional (2D) “spatial frequencies’’ which compri the input signal’s “spectrum.” The ouput of the FDCT is the t of 64 basis-signal amplitudes or “DCT coefficients” who values are uniquely determined by the particular 64-point input signal.年的笔顺笔画
The DCT coefficient values can thus be regarded as the relative amount of the 2D spatial frequencies contained in the 64-point input signal. The coefficient with zero frequency in both dimensions is called the “DC coefficient” and the remaining 63 coefficients are called the “AC coefficients.’’ Becau sample values
[
F(u,v)=1
4
C(u)C(v)
7
x=0
7
y=0
f(x,y)*
cos(2x+1)uπ
16
cos(2x+1)vπ
16
尼玛堆
](1)
[
f(x,y)=1
4
7
u=0
7
v=0
C(u)C(v)F(u,v)*
cos(2x+1)uπ
16
cos(2x+1)vπ
16
]
(2) where:for
otherwi.
C(u),C(v)=1 2
√
u,
C(u),C(v)=1
v=;0
typically vary slowly from point to point across an image, the FDCT processing step lays the foundation for achieving data compression by concentrating most of the signal in the lower spatial frequencies. For a typical 8x8 sample block from a typical source image,most of the spatial frequencies have zero or near-zero amplitude and need not be encoded.
At the decoder the IDCT revers this processing step. It takes the 64 DCT coefficients (which at tha
t point have been quantized) and reconstructs a 64-point ouput image signal by summing the basis signals. Mathematically, the DCT is one-to-one mapping for 64-point vectors between the image and the frequency domains. If the FDCT and IDCT could be computed with perfect accuracy and if the DCT coefficients were not quantized as in the following description, the original 64-point signal could be exactly recovered. In principle, the DCT introduces no loss to the source image samples; it merely transforms them to a domain in which they can be more efficiently encoded.
Some properties of practical FDCT and IDCT implementations rai the issue of what precily should be required by the JPEG standard. A fundamental property is that the FDCT and IDCT equations contain transcendental functions. Conquently, no physical implementation can compute them with perfect accuracy. Becau of the DCT’s application importance and its relationship to the DFT, many different algorithms by which the
愁的反义词FDCT and IDCT may be approximately computed have been devid [16]. Indeed, rearch in fast DCT algorithms is ongoing and no single algorithm is optimal for all implementations. What is optimal in software for a general-purpo CPU is unlikely to be optimal in firmware for a programmable DSP and is certain to be suboptimal for dedicated VLSI.
Even in light of the finite precision of the DCT inputs and outputs, independently designed implementations of the very same FDCT or IDCT algorithm which differ even minutely in the precision by which they reprent cosine terms or intermediate results, or in the way they sum and round fractional values, will eventually produce slightly different outputs from identical inputs.To prerve freedom for innovation and customization within implementations, JPEG has chon to specify neither a unique FDCT algorithm or a unique IDCT algorithm in its propod standard. This makes compliance somewhat more difficult to confirm,becau two compliant encoders (or decoders)generally will not produce identical outputs given identical inputs. The JPEG standard will address this issue by specifying an accuracy test as part of its compliance tests for all DCT-bad encoders and decoders; this is to ensure against crudely inaccurate cosine basis functions which would degrade image quality.
Entropy
Decoder
Dequantizer IDCT
DCT-Bad Decoder
Table Table Specifications Specifications Compresd Image Data Reconstructed Image Data
数学思想有哪些
Figure 1. DCT-Bad Encoder Processing Steps
Figure 2. DCT-Bad Decoder Processing Steps
For each DCT-bad mode of operation, the JPEG proposal specifies parate codecs for images with 8-bit and 12-bit (per component) source image samples. The 12-bit codecs, needed to accommodate certain types of medical and other images, require greater computational resources to achieve the required FDCT or IDCT accuracy. Images with other sample precisions can usually be accommodated by either an 8-bit or 12-bit codec, but this must be done outside the JPEG standard. For example, it would be the responsibility of an application to decide how to fit or pad a 6-bit sample into the 8-bit encoder’s input interface, how to unpack it at the decoder’s output, and how to encode
any necessary related information.4.2 Quantizationqq自动回复内容
After output from the FDCT, each of the 64 DCT coefficients is uniformly quantized in conjunction with a 64-element Quantization Table, which must be specified by the application (or ur) as an input to the encoder. Each element can be any integer value from 1to 255, which specifies the step size of the quantizer for its corresponding DCT coefficient. The purpo of quantization is to achieve further compression by reprenting DCT coefficients with no greater precision than is necessary to achieve the desired image quality. Stated another way, the goal of this processing step is to discard information which is not visually significant.Quantization is a many-to-one mapping, and therefore is fundamentally lossy. It is the principal source of lossiness in DCT-bad encoders.
Quantization is defined as division of each DCT coefficient by its corresponding quantizer step size,followed by rounding to the nearest integer:
F Q (u ,v ) = Integer Round ( F (u ,v )
Q (u ,v
) )
(3)
This output value is normalized by the quantizer step size. Dequantization is the inver function, which in this ca means simply that the normalization is removed by multiplying by the step size, which returns the result to a reprentation appropriate for input to the IDCT:
怎么登录qq邮箱When the aim is to compress the image as much as possible without visible artifacts, each step size ideally should be chon as the perceptual threshold or “just noticeable difference” for the visual contribution of its corresponding cosine basis function. The thresholds are also functions of the source image characteristics,display characteristics and viewing distance. For applications in which the variables can be reasonably well defined, psychovisual experiments can be performed to determine the best thresholds. The experiment described in [12] has led to a t of Quantization Tables for CCIR-601 [4] images and displays. The have been ud experimentally by JPEG members and will appear in the ISO standard as a matter of information, but not as a requirement.4.3 DC Coding and Zig-Zag Sequence
After quantization, the DC coefficient is treated parately from the 63 AC coefficients. The DC coefficient is a measure of the average value of the 64image samples. Becau there is usually stro
亚洲的英语
ng correlation between the DC coefficients of adjacent 8x8blocks, the quantized DC coefficient is encoded as the difference from the DC term of the previous block in the encoding order (defined in the following), as shown in Figure 3. This special treatment is worthwhile, as DC coefficients frequently contain a significant fraction of the total image energy.
F Q (u ,v ) =F Q
(u ,v )Q (u ,v )
*(4)
. . .
DIFF = DC i - DC i-1
i-1
i
Differential DC encoding Zig−zag quence
. . .77
07
70
Figure 3. Preparation of Quantized Coefficients for Entropy Coding