The MPEG standard

This is a collection of frequently asked question about the MPEG compression standard. It is organized as an hypertext in HTML format to be easily extensible and upgradable.

Many sources contributed to this list.

If you wish to contribute, correct any mistake or just send your comments and impressions please contact :

Luigi.Filippini@crs4.it

What is MPEG ?

MPEG (Moving Pictures Experts Group) is a group of people that meet under ISO (the International Standards Organization) to generate standards for digital video (sequences of images in time) and audio compression. In particular, they define a compressed bit stream, which implicitly defines a decompressor. However, the compression algorithms are up to the individual manufacturers, and that is where proprietary advantage is obtained within the scope of a publicly available international standard. MPEG meets roughly four times a year for roughly a week each time. In between meetings, a great deal of work is done by the members, so it doesn't all happen at the meetings. The work is organized and planned at the meetings. MPEG itself is a nickname. The official name is: ISO/IEC JTC1 SC29 WG11.

ISO: International Organization for Standardization IEC: International Electro-technical Commission JTC1: Joint Technical Committee 1 SC29: Sub-committee 29 WG11: Work Group 11 (moving pictures with... uh, audio)

Does it have anything to do with JPEG ?

Well, it sounds the same, and they are part of the same subcommittee of ISO along with JBIG and MHEG, and they usually meet at the same place at the same time. However, they are different sets of people with few or no common individual members, and they have different charters and requirements.

JPEG is for still image compression. JBIG is for binary image compression (like faxes), and MHEG is for multi-media data standards (like integrating stills, video, audio, text, etc.).

The most fundamental difference between MPEG and JPEG is MPEG's use of block-based motion compensated prediction (MCP), a general method falling into the temporal DPCM, category.

The second most fundamental difference is in the target application. JPEG adopts a general purpose philosophy: independence from color space (up to 255 components per frame) and quantization tables for each component. Extended modes in JPEG include two sample precisions (8 and 12 bit sample accuracy), combinations of frequency progressive, spatially progressive, and amplitude progressive scanning modes. Color independence is made possible thanks to down-loadable Huffman tables.

Since MPEG is targeted for a set of specific applications, there is only one color space (4:2:0 YCbCr), one sample precision (8 bits), and one scanning mode (sequential). Luminance and chrominance share quantization tables. The range of sampling dimensions are more limited as well. MPEG adds adaptive quantization at the macroblock (16 x 16 pixel area) layer. This permits both smoother bit rate control and more perceptually uniform quantization throughout the picture and image sequence. Adaptive quantization is part of the JPEG-2 charter. MPEG variable length coding tables are non-downloadable, and are therefore optimized for a limited range of compression ratios appropriate for the target applications.

The local spatial decorrelation methods in MPEG and JPEG are very similar. Picture data is block transform coded with the two-dimensional orthonormal 8x8 DCT. The resulting 63 AC transform coefficients are mapped in a zig-zag pattern to statistically increase the runs of zeros. Coefficients of the vector are then uniformly scalar quantized, run-length coded, and finally the run-length symbols are variable length coded using a canonical (JPEG) or modified Huffman (MPEG) scheme. Global frame redundancy is reduced by 1-D DPCM, of the block DC coefficients, followed by quantization and variable length entropy coding.

MCP DCT ZZ Q Frame -> 8x8 spatial block -> 8x8 frequency block -> Zig-zag scan -> RLC VLC quantization -> run-length coding -> variable length coding. The similarities have made it possible for the development of hard-wired silicon that can code both standards. Even microcoded architectures can better optimize through hardwired instruction primitives or functional blocks. There are many additional minor differences. They include:
  1. DCT and quantization precision in MPEG is 9-bits since the macroblock difference operation expands the 8-bit signal precision by one bit.
  2. Quantization in MPEG-1 forces quantized coefficients to become odd values (oddification).
  3. JPEG run-length coding produces run-size tokens (run of zeros, non-zero coefficient magnitude) whereas MPEG produces fully concatenated run-level tokens that do not require magnitude differential bits.
  4. DC values in MPEG-1 are limited to 8-bit precision (a constant step size of 8), whereas JPEG DC precision can occupy all possible 11-bits. MPEG-2, however, re-introduced extra DC precision.

How do MPEG and H.261 differ ?

H.261 was targeted for teleconferencing applications where motion is naturally more limited. Motion vectors are restricted to a range of +/- 15 pixels. Accuracy is reduced since H.261 motion vectors are restricted to integer-pel accuracy. Other syntactic differences include: no B-pictures, different quantization method.

H.261 is also known as P*64. "P" is an integer number meant to represent multiples of 64kbit/sec. In the end, this nomenclature probably won't be used as many services other than video will adopt the philosophy of arbitrary B channel (64kbit) bitrate scalability.

Is H.261 the de facto teleconferencing standard ?

Not exactly. To date, about seventy percent of the industrial teleconferencing hardware market is controlled by PictureTel of Mass. The second largest market controller is Compression Labs of Silicon Valley. PictureTel hardware includes compatibility with H.261 as a lowest common denominator, but when in communication with other PictureTel hardware, it can switch to a mode superior at low bit rates (less than 300kbits/sec). In fact, over 2/3 of all teleconferencing is done at two-times switched 56 channel (~P = 2) bandwidth. Long distance ISDN ain't cheap. In each direction, video and audio are coded at an aggregate of 112 kbits/sec (2*56 kbits/sec).

The PictureTel proprietary compression algorithm is acknowledged to be a combination of spatial pyramid, lattice vector quantizer, and an unidentified entropy coding method. Motion compensation is considerably more refined and sophisticated than the 16x16 integer-pel block method specified in H.261.

The Compression Labs proprietary algorithm also offers significant improvement over H.261 when linked to other CLI hardware.

Currently, ITU-TS (International Telecommunications Union--Teleconferencing Sector), formerly CCITT, is quietly defining an improvement to H.261 with the participation of industry vendors.

What is the reasoning behind MPEG syntax symbols ?

Here are some of the Whys and Wherefores of MPEG symbols:

How would you explain MPEG to the data compression expert ?

What are the implementation requirements ?

MPEG pushes the limit of economical VLSI technology (but you get what you pay for in terms of picture quality or compaction efficiency)

Video Typical decoder Total DRAM bus width Profile transistor count DRAM @ speed ------------ ---------------- ------- ------------------- MPEG-1 CPB 0.4 to .75 million 4 Mbit 16 bits @ 80 ns MPEG-1 601 0.8 to 1.1 million 16 Mbit 64 bits @ 80 ns MPEG-2 MP@ML 0.9 to 1.5 million 16 Mbit 64 bits @ 80 ns MPEG-2 MP@High1440 2 to 3 million 64 Mbit N/A 70 or 80ns DRAM speed is a measure of the shortest period in which words can be transferred across the bus. In the case of MPEG-1 SIF, 80ns implies (1/80ns)(16bits) or about 25 MBytes/sec of bandwidth. Lack of cheap memory (DRAM) utilization is where the original DVI algorithm made a costly mistake. PAL required expensive VRAM/SRAM chips (a static RAM transistor requires 6 transistors compared to 1 transistor for DRAM). Fast page mode DRAM (which has slower throughput than SRAM and requires near-contiguous address mapping) is viable for MPEG due almost exclusively to the block nature of the algorithm and syntax (DRAM memory locations are broken into rows and columns).

How do I join MPEG ?

You don't join MPEG. You have to participate in ISO as part of a national delegation. How you get to be part of the national delegation is up to each nation. I only know the U.S., where you have to attend the corresponding ANSI meetings to be able to attend the ISO meetings. Your company or institution has to be willing to sink some bucks into travel since, naturally, these meetings are held all over the world. (For example, Paris, Santa Clara, Kurihama Japan, Singapore, Haifa Israel, Rio de Janeiro, London, etc.)

What is the evolution of standard documents ?

In chronological order:

How do I get the documents ?

MPEG is a draft ISO standard. It's exact name is ISO CD 11172. The draft consists of three parts: System, Video, and Audio. The System part (11172-1) deals with synchronization and multiplexing of audio-visual information, while the Video (11172-2) and Audio part (11172-3) address the video and the audio compression techniques respectively. Part 4, Conformance Testing, is currently a CD. You may order it from your national standards body (e.g. ANSI in the USA) or buy it from other companies like :

ISO Sales Case Postale 56 CH-1211 Geneve 20 Switzerland ANSI Attn: Sales 11 West 42nd Street New York, NY 10036 phone 212-642-4900 Phillips Business Information 7811 Montrose Rd Potomac, MD 20854. phone +1 301 424-3338 (800) OMNICOM fax +1 301 309-3847 Global Engineering Documents For inquiries withing the US: 1990 M Street NW, Suite 400 Washington, DC 20036 800-854-7179 (Voice) 202-331-0960 (Fax) For inquiries from outside the US: 2805 McGaw Avenue Irvine, CA 92714 +1-714-261-1455 Beuth Verlag Postfach 1145 W-1000 Berlin 30 Germany

What are the important themes of MPEG ?

  1. Application specific. MPEG does not solve everybody's application needs, but offers a syntax that is a good solution for most. MPEG does not, for example, decorrelate energies situated 1/256th of a pixel between a non-linear combination of 1000 frames. The syntax was designed to occupy an optimum between cost and quality ... in other words, between computational complexity (VLSI area, memory size and bandwidth) and compaction (compression) efficiency.
  2. The DCT and Huffman algorithms are some of the least significant aspects of the standard, and yet somehow receive the most press coverage.
  3. In the encoding algorithm, you can do what you want as long as the bitstreams produced are compliant. There is a huge difference in picture quality between, for example, the test model and real-world propriety implementions of encoding.

How do you tell a MPEG-1 bitstream from a MPEG-2 bitstream ?

All MPEG-2 bitstreams must have certain extension headers that *immediately* follow MPEG-1 headers. At the highest layer, for example, the MPEG-1 style sequence_header() is followed by sequence_extension() which is exclusive to MPEG-2. Some extension headers are specific to MPEG-2 profiles. For example, sequence_scalable_extension() is not allowed in Main Profile.

A simple program need only scan the coded bitstream for byte-aligned start codes to determine whether the stream is MPEG-1 or MPEG-2.

What is the precision of MPEG samples ?

By definition, MPEG samples have no more and no less than 8-bits uniform sample precision (256 quantization levels). For luminance (which is unsigned) data, black corresponds to level 0, white is level 255. However, in CCIR-610 recommendation chromaticy, levels 0 through 14 and 236 through 255 are reserved for blanking signal excursions. MPEG currently has no such clipped excursion restrictions.

What is the best compression ratio for MPEG ?

The MPEG sweet spot is about 1.2 bits/pel Intra and .35 bits/pel inter. Experimentation has shown that intra frame coding with the familiar DCT-Quantization-Entropy hybrid algorithm achieves optimal performance at about an average of 1.2 bits/sample or about 6:1 compression ratio. Below this point, artifacts become noticeable.

What about MPEG artifacts ?

If the encoder did its job properly, and the user specified a proper balance between sample rate and bitrate, there shouldn't be any visible artifacts. However, in sub-optimal systems, you can look for:

Gibbs phenomenon/Ringing/Aliasing (too few AC bits, not enough pre-filtering) Blockiness (not considering your neighbors before quantizing) Posterization (too few DC bits) Checkerboards (DCT eigenimages as a result of too few AC coefficients) Color bleeding (not considering color in encoder cost model)

Are there single chip MPEG encoder ?

Yes, the C-Cube CL-4000 is the only single-chip, real-time encoder that can process true MPEG-1 SIF rate video.

Single chip for +/- 15 pel motion estimation at SIF rates (352x240x30 Hz) Two chips for +/- 32 pel at SIF rates (hierarchical) 5 or 6 chips for MPEG-2 at CCIR-610 rates (704 x 480 x 30 Hz) Highly microcoded architecture. Can code both H.261 and JPEG. Implements high picture quality microcode programs. [more details from CICC'93 and HotChips '93 conference to be included]

IBM and SGS-Thomson plan to introduce more hard-wired, multi-chip solutions in 1994.

What about MPEG-1 decoder chips ?

By implication of MPEG-2 Conformance requirements, all MPEG-2 decoders are required to decode MPEG-1 bitstreams as well. These chips, however, are strictly MPEG-1:

C-Cube CL-450 SIF rates. Single-chip. Has on-board CPU. SGS-Thomson 3400 SIF rates. Single-chip. Hardwired. Motorola MCD250 SIF rates. Single-chip. LSI 641172 CCIR-601 rates. Single-chip. Systems packet decoder on-chip.

What about audio chips ?

To date, only Layer I and Layer II have been implemented in dedicated (ASIC) silicon:

Motorola MCD260 Texas Instruments TI 320AV110 hardwired with systems parsing) operates in free format (arbitrary sample rate) 120 pin PQFP package Serial data port Part of technology exchange with C-Cube LSI Logic L64111 hardwired w/CPU with on-chip systems parsing. Serial data port 100-pin PQFP GCA/ASCII ? Crystal Semiconductor CS4920 on-chip, 2 channel 16-bit digital-to-analog converter (DAC) 16 MIPS, 24-bit DSP programmable clock manager 44-pin PLCC package Programmable architecture. For example, can download Layer II MPEG-1 audio or Dolby AC-2 $38 each in large quantities Dolby AC-3 MPEG NY disclosure claimed to be less computationally intensive Zoran, GI working on own DSP-like dedicated chips.

Will there be an MPEG video tape format ?

There is a consortium of companies (Philips, JVC, Sony, Matushista, et al) developing a metal particle based 6 millimeter consumer digital video tape format. It will initially use more JPEG-like independent frame compression for cheap encoding of source analog (NTSC, PAL) video. The consequence of course is less efficient use of bandwidth ( 25 Mbit/sec for the same quality achieved at 6 Mbit/sec with MPEG). Pre-compressed video from broadcast sources will be directly recorded to tape and "passed-through" as a coded bitstream to the video decompression "box" upon playback.

Is so-and-so really MPEG compliant ?

At the very least, there are two areas of conformance/compliance in MPEG: 1. Compliant bitstreams 2. compliant decoders. Technically speaking, video bitstreams consisting entirely of I-frames (such as those generated by Xing software) are syntactically compliant with the MPEG specification. The I-frame sequence is simply a subset of the full syntax. Compliant bitstreams must obey the range limits (e.g. motion vectors limited to +/-128, frame sizes, frame rates, etc.) and syntax rules (e.g. all slices must commence and terminate with a non-skipped macroblock, no gaps between slices, etc.).

Decoders, however, cannot escape true conformance. For example, a decoder that cannot decode P or B frames are *not* legal MPEG. Likewise, full arithmetic precision must be obeyed before any decoder can be called "MPEG compliant." The IDCT, inverse quantizer, and motion compensated prediction must meet the specification requirements... which are fairly rigid (e.g. no more than 1 least significant bit of error between reference and test decoders). Real-time conformance is more complicated to measure than arithmetic precision, but it is reasonable to expect that decoders that skip frames on reasonable bitstreams are not likely to be considered compliant.

What are some journals on related MPEG topics ?

IEEE Multimedia [first edition Spring 1994] IEEE Transactions on Consumer Electronics IEEE Transactions on Broadcasting IEEE Transactions on Circuits and Systems for Video Technology Advanced Electronic Imaging Electronic Engineering Times (EE Times) IEEE Int'l Conference on Acoustics, Speech, and Signal Processing (ICASSP) International Broadcasting Convention (IBC) Society of Motion Pictures and Television Engineers (SMPTE) SPIE conference on Visual Communications and Image Processing SPIE conference on Video Compression for Personal Computers (to be held Feb 1994 in San Jose)

Which performances should I expect from MPEG boards ?

The OptiVision, along with products from Optibase and Scientific Atlanta do real time compression and storage to disk. The cheap video boards, at best, can only do 30 fps with about 160 x 120 windows. Nobody can do 352 x 240 in real time without the right hardware. The SA product is about $30K list and the Optibase somewhere around $20K for the board set.

A board from Optivision that can do the MPEG conversion off line. Even this is costly (about $2,000) to get it done in any decent time frame.

If you believe that $20,000 is high, AT&T at the Western Cable Show 1993 demonstrated a real time MPEG-2 compression system at $90,000.

The market for these real time systems is very real; it is the satellite uplink and cable television market. Nominal compression ratios are running about 200:1 for MPEG-1 in the Optibase product. For broadcast quality, compression ratios are lower. Even here, you have to be careful. 200:1 really means "take a 640 x 480 image, sub-sample it to 320x240 (throwing out data to get 4:1 compression), then compress it 50:1 doing MPEG".

FrameRate Labs is about ready to release a board that does 640 x 240 real time capture and storage to disk without any compression or dropped frames; it will compress offline. This is brute force but far cheaper than a $20,000 solution. If you need real-time all day long, talk to Scientific-Atlanta, Optibase or OptiVision. If you need real-time for a brief-time with dropped frames, use the low-end boards like Video Spigot, etc. If you need real-time for a brief-time without loss of data, FrameRate Labs might have a solution.

The low end board manufacturers label their products real-time 30 FPS and then, in the next sentence, they claim to be able to capture an image 640 x 480. But, they never say these things in the same sentence.

Are there any MPEG FTP or WWW sites ?

There are now many anonymous FTP site with MPEG programs or movies. A site archiving most of the public domain programs and documents about the MPEG standard (and also other compression techniques) may be found at ftp.crs4.it