This is a collection of frequently asked question about the MPEG compression standard. It is organized as an hypertext in HTML format to be easily extensible and upgradable.
Many sources contributed to this list.
If you wish to contribute, correct any mistake or just send your comments and impressions please contact :
MPEG (Moving Pictures Experts Group) is a group of people that meet under ISO (the International Standards Organization) to generate standards for digital video (sequences of images in time) and audio compression. In particular, they define a compressed bit stream, which implicitly defines a decompressor. However, the compression algorithms are up to the individual manufacturers, and that is where proprietary advantage is obtained within the scope of a publicly available international standard. MPEG meets roughly four times a year for roughly a week each time. In between meetings, a great deal of work is done by the members, so it doesn't all happen at the meetings. The work is organized and planned at the meetings. MPEG itself is a nickname. The official name is: ISO/IEC JTC1 SC29 WG11.
ISO: International Organization for Standardization IEC: International Electro-technical Commission JTC1: Joint Technical Committee 1 SC29: Sub-committee 29 WG11: Work Group 11 (moving pictures with... uh, audio)
Does it have anything to do with JPEG ?
Well, it sounds the same, and they are part of the same subcommittee of ISO along with JBIG and MHEG, and they usually meet at the same place at the same time. However, they are different sets of people with few or no common individual members, and they have different charters and requirements.
JPEG is for still image compression. JBIG is for binary image compression (like faxes), and MHEG is for multi-media data standards (like integrating stills, video, audio, text, etc.).
The most fundamental difference between MPEG and JPEG is MPEG's use of block-based motion compensated prediction (MCP), a general method falling into the temporal DPCM, category.
The second most fundamental difference is in the target application. JPEG adopts a general purpose philosophy: independence from color space (up to 255 components per frame) and quantization tables for each component. Extended modes in JPEG include two sample precisions (8 and 12 bit sample accuracy), combinations of frequency progressive, spatially progressive, and amplitude progressive scanning modes. Color independence is made possible thanks to down-loadable Huffman tables.
Since MPEG is targeted for a set of specific applications, there is only one color space (4:2:0 YCbCr), one sample precision (8 bits), and one scanning mode (sequential). Luminance and chrominance share quantization tables. The range of sampling dimensions are more limited as well. MPEG adds adaptive quantization at the macroblock (16 x 16 pixel area) layer. This permits both smoother bit rate control and more perceptually uniform quantization throughout the picture and image sequence. Adaptive quantization is part of the JPEG-2 charter. MPEG variable length coding tables are non-downloadable, and are therefore optimized for a limited range of compression ratios appropriate for the target applications.
The local spatial decorrelation methods in MPEG and JPEG are very similar. Picture data is block transform coded with the two-dimensional orthonormal 8x8 DCT. The resulting 63 AC transform coefficients are mapped in a zig-zag pattern to statistically increase the runs of zeros. Coefficients of the vector are then uniformly scalar quantized, run-length coded, and finally the run-length symbols are variable length coded using a canonical (JPEG) or modified Huffman (MPEG) scheme. Global frame redundancy is reduced by 1-D DPCM, of the block DC coefficients, followed by quantization and variable length entropy coding.
How do MPEG and H.261 differ ?
H.261 was targeted for teleconferencing applications where motion is naturally more limited. Motion vectors are restricted to a range of +/- 15 pixels. Accuracy is reduced since H.261 motion vectors are restricted to integer-pel accuracy. Other syntactic differences include: no B-pictures, different quantization method.
H.261 is also known as P*64. "P" is an integer number meant to represent multiples of 64kbit/sec. In the end, this nomenclature probably won't be used as many services other than video will adopt the philosophy of arbitrary B channel (64kbit) bitrate scalability.
Is H.261 the de facto teleconferencing standard ?
Not exactly. To date, about seventy percent of the industrial teleconferencing hardware market is controlled by PictureTel of Mass. The second largest market controller is Compression Labs of Silicon Valley. PictureTel hardware includes compatibility with H.261 as a lowest common denominator, but when in communication with other PictureTel hardware, it can switch to a mode superior at low bit rates (less than 300kbits/sec). In fact, over 2/3 of all teleconferencing is done at two-times switched 56 channel (~P = 2) bandwidth. Long distance ISDN ain't cheap. In each direction, video and audio are coded at an aggregate of 112 kbits/sec (2*56 kbits/sec).
The PictureTel proprietary compression algorithm is acknowledged to be a combination of spatial pyramid, lattice vector quantizer, and an unidentified entropy coding method. Motion compensation is considerably more refined and sophisticated than the 16x16 integer-pel block method specified in H.261.
The Compression Labs proprietary algorithm also offers significant improvement over H.261 when linked to other CLI hardware.
Currently, ITU-TS (International Telecommunications Union--Teleconferencing Sector), formerly CCITT, is quietly defining an improvement to H.261 with the participation of industry vendors.
What is the reasoning behind MPEG syntax symbols ?
Here are some of the Whys and Wherefores of MPEG symbols:
These 32-bit byte-aligned codes provide a mechanism for cheaply searching coded bitstreams for commencement of various layers of video without having to actually parse or decode. Start codes also provide a mechanism for resynchronization in the presence of bit errors.
(CBP --not to be confused with Constrained Parameters!) When the frame prediction is particularly good, the displaced frame difference (DFD, or prediction error) tends to be small, often with entire block energy being reduced to zero after quantization. This usually happens only at low bit rates. Coded block patterns prevent the need for transmitting EOB symbols in those zero coded blocks.
Each intra coded block has a DC coefficient. Inter coded blocks (prediction error or DFD) naturally do not since the prediction error is the first derivative of the video signal. With coded block patterns signalling all possible non-coded block patterns, the dct_coef_first mechanism assigns a different meaning to the VLC codeword that would otherwise represent EOB as the first coefficient.
Saves unnecessary run-length codes. At optimal bitrates, there tends to be few AC coefficients concentrated in the early stages of the zig-zag vector. In MPEG-1, the 2-bit length of EOB implies that there is an average of only 3 or 4 non-zero AC coefficients per block. In MPEG-2 Intra (I) pictures, with a 4-bit EOB code, this number is between 9 and 16 coefficients. Since EOB is required for all coded blocks, its absence can signal that a syntax error has occurred in the bitstream.
A genuine pain for VLSI implementations, macroblock stuffing was introduced to maintain smoother, constant bitrate control in MPEG-1. However, with normalized complexity measures and buffer management performed on a a priori (pre-frame, pre-slice, and pre-macroblock) basis in the MPEG-2 encoder test model, the need for such localized smoothing evaporated. Stuffing can be achieved through virtually unlimited slice start code padding if required. A good rule of thumb: if you find yourself often using stuffing more than once per slice, you probably don't have a very good rate control algorithm. Anyway, macroblock stuffing is now illegal in MPEG-2.
The VLC tables in MPEG are not Huffman tables in the true sense of Huffman coding, but are more like the tables used in Group 3 fax. They are entropy constrained, that is, non-downloadable and optimized for a limited range of bit rates (sweet spots). With the acception of a few codewords, the larger tables were carried over from the H.261 standard of 1990. MPEG-2 added an "Intra table". Note that the dct_coefficient tables assume positive/negative coefficient pmf symmetry.
How would you explain MPEG to the data compression expert ?
What are the implementation requirements ?
MPEG pushes the limit of economical VLSI technology (but you get what you pay for in terms of picture quality or compaction efficiency)
You don't join MPEG. You have to participate in ISO as part of a national delegation. How you get to be part of the national delegation is up to each nation. I only know the U.S., where you have to attend the corresponding ANSI meetings to be able to attend the ISO meetings. Your company or institution has to be willing to sink some bucks into travel since, naturally, these meetings are held all over the world. (For example, Paris, Santa Clara, Kurihama Japan, Singapore, Haifa Israel, Rio de Janeiro, London, etc.)
What is the evolution of standard documents ?
In chronological order:
Voting members ballot on the creation of a new standards project.
Project Leader manages the development of a Working Draft.
Consensus is achieved on a Committee Draft.
National bodies vote on a Draft International Standard.
ISO publishes the International Standard.
MPEG is a draft ISO standard. It's exact name is ISO CD 11172. The draft consists of three parts: System, Video, and Audio. The System part (11172-1) deals with synchronization and multiplexing of audio-visual information, while the Video (11172-2) and Audio part (11172-3) address the video and the audio compression techniques respectively. Part 4, Conformance Testing, is currently a CD. You may order it from your national standards body (e.g. ANSI in the USA) or buy it from other companies like :
What are the important themes of MPEG ?
How do you tell a MPEG-1 bitstream from a MPEG-2 bitstream ?
All MPEG-2 bitstreams must have certain extension headers that *immediately* follow MPEG-1 headers. At the highest layer, for example, the MPEG-1 style sequence_header() is followed by sequence_extension() which is exclusive to MPEG-2. Some extension headers are specific to MPEG-2 profiles. For example, sequence_scalable_extension() is not allowed in Main Profile.
A simple program need only scan the coded bitstream for byte-aligned start codes to determine whether the stream is MPEG-1 or MPEG-2.
What is the precision of MPEG samples ?
By definition, MPEG samples have no more and no less than 8-bits uniform sample precision (256 quantization levels). For luminance (which is unsigned) data, black corresponds to level 0, white is level 255. However, in CCIR-610 recommendation chromaticy, levels 0 through 14 and 236 through 255 are reserved for blanking signal excursions. MPEG currently has no such clipped excursion restrictions.
What is the best compression ratio for MPEG ?
The MPEG sweet spot is about 1.2 bits/pel Intra and .35 bits/pel inter. Experimentation has shown that intra frame coding with the familiar DCT-Quantization-Entropy hybrid algorithm achieves optimal performance at about an average of 1.2 bits/sample or about 6:1 compression ratio. Below this point, artifacts become noticeable.
If the encoder did its job properly, and the user specified a proper balance between sample rate and bitrate, there shouldn't be any visible artifacts. However, in sub-optimal systems, you can look for:
Are there single chip MPEG encoder ?
Yes, the C-Cube CL-4000 is the only single-chip, real-time encoder that can process true MPEG-1 SIF rate video.
Single chip for +/- 15 pel motion estimation at SIF rates (352x240x30 Hz) Two chips for +/- 32 pel at SIF rates (hierarchical) 5 or 6 chips for MPEG-2 at CCIR-610 rates (704 x 480 x 30 Hz) Highly microcoded architecture. Can code both H.261 and JPEG. Implements high picture quality microcode programs. [more details from CICC'93 and HotChips '93 conference to be included]
IBM and SGS-Thomson plan to introduce more hard-wired, multi-chip solutions in 1994.
What about MPEG-1 decoder chips ?
By implication of MPEG-2 Conformance requirements, all MPEG-2 decoders are required to decode MPEG-1 bitstreams as well. These chips, however, are strictly MPEG-1:
To date, only Layer I and Layer II have been implemented in dedicated (ASIC) silicon:
Will there be an MPEG video tape format ?
There is a consortium of companies (Philips, JVC, Sony, Matushista, et al) developing a metal particle based 6 millimeter consumer digital video tape format. It will initially use more JPEG-like independent frame compression for cheap encoding of source analog (NTSC, PAL) video. The consequence of course is less efficient use of bandwidth ( 25 Mbit/sec for the same quality achieved at 6 Mbit/sec with MPEG). Pre-compressed video from broadcast sources will be directly recorded to tape and "passed-through" as a coded bitstream to the video decompression "box" upon playback.
Is so-and-so really MPEG compliant ?
At the very least, there are two areas of conformance/compliance in MPEG: 1. Compliant bitstreams 2. compliant decoders. Technically speaking, video bitstreams consisting entirely of I-frames (such as those generated by Xing software) are syntactically compliant with the MPEG specification. The I-frame sequence is simply a subset of the full syntax. Compliant bitstreams must obey the range limits (e.g. motion vectors limited to +/-128, frame sizes, frame rates, etc.) and syntax rules (e.g. all slices must commence and terminate with a non-skipped macroblock, no gaps between slices, etc.).
Decoders, however, cannot escape true conformance. For example, a decoder that cannot decode P or B frames are *not* legal MPEG. Likewise, full arithmetic precision must be obeyed before any decoder can be called "MPEG compliant." The IDCT, inverse quantizer, and motion compensated prediction must meet the specification requirements... which are fairly rigid (e.g. no more than 1 least significant bit of error between reference and test decoders). Real-time conformance is more complicated to measure than arithmetic precision, but it is reasonable to expect that decoders that skip frames on reasonable bitstreams are not likely to be considered compliant.
What are some journals on related MPEG topics ?
Which performances should I expect from MPEG boards ?
The OptiVision, along with products from Optibase and Scientific Atlanta do real time compression and storage to disk. The cheap video boards, at best, can only do 30 fps with about 160 x 120 windows. Nobody can do 352 x 240 in real time without the right hardware. The SA product is about $30K list and the Optibase somewhere around $20K for the board set.
A board from Optivision that can do the MPEG conversion off line. Even this is costly (about $2,000) to get it done in any decent time frame.
If you believe that $20,000 is high, AT&T at the Western Cable Show 1993 demonstrated a real time MPEG-2 compression system at $90,000.
The market for these real time systems is very real; it is the satellite uplink and cable television market. Nominal compression ratios are running about 200:1 for MPEG-1 in the Optibase product. For broadcast quality, compression ratios are lower. Even here, you have to be careful. 200:1 really means "take a 640 x 480 image, sub-sample it to 320x240 (throwing out data to get 4:1 compression), then compress it 50:1 doing MPEG".
FrameRate Labs is about ready to release a board that does 640 x 240 real time capture and storage to disk without any compression or dropped frames; it will compress offline. This is brute force but far cheaper than a $20,000 solution. If you need real-time all day long, talk to Scientific-Atlanta, Optibase or OptiVision. If you need real-time for a brief-time with dropped frames, use the low-end boards like Video Spigot, etc. If you need real-time for a brief-time without loss of data, FrameRate Labs might have a solution.
The low end board manufacturers label their products real-time 30 FPS and then, in the next sentence, they claim to be able to capture an image 640 x 480. But, they never say these things in the same sentence.
Are there any MPEG FTP or WWW sites ?
There are now many anonymous FTP site with MPEG programs or movies. A site archiving most of the public domain programs and documents about the MPEG standard (and also other compression techniques) may be found at ftp.crs4.it