At a meeting hosted in New York by Columbia University, the Moving Picture Experts Group (MPEG) completed definition of MPEG-2 Video, MPEG-2 Audio, and MPEG-2 Systems. MPEG therefore confirmed that it is on schedule to produce, by November 1993, Committee Drafts of all three parts of the MPEG-2 Standard, for balloting by its member countries.
To ensure that a harmonized solution to the widest range of applications is achieved, MPEG, an ISO/IEC working group designated ISO/IEC JTC1/SC29/WG11, is working jointly with the ITU-TS Study Group 15 Experts Group for ATM Video Coding. MPEG also collaborates with representatives from other parts of ITU-TS, and from EBU, ITU-RS, SMPTE, and the North American HDTV community.
MPEG-1 was optimized for CD-ROM or applications at about 1.5 Mbit/sec. Video was strictly non-interlaced (i.e. progressive). The international co-operation had executed so well for MPEG-1, that the committee began to address applications at broadcast TV sample rates using the CCIR-610 recommendation (720 samples/line by 480 lines per frame by 30 frames per second... or about 15.2 million samples/sec including chroma) as the reference.
Unfortunately, today's TV scanning pattern is interlaced. This introduces a duality in block coding: do local redundancy areas (blocks) exist exclusively in a field or a frame... (or a particle or wave) ? The answer of course is that some blocks are one or the other at different times, depending on motion activity.
The additional man years of experimentation and implementation between MPEG-1 and MPEG-2 improved the method of block-based transform coding.
What are the typical MPEG-2 bitrates and picture quality ?
Here are some examples of typical frame sizes in bits :
When will an MPEG-2 decoder chip be available ?
Several chips will be sampling in late 1993. For reasons of economy and scale in the cable TV application, all are single-chip (not including DRAM and host CPU/controller) implementations. They are:
Where will we see MPEG in everyday life? ?
Just about wherever you see video today.
What did MPEG-2 add to MPEG-1 in terms of syntax/algorithm ?
Here is a brief summary:
More aspect ratios. A minor, yet necessary part of the syntax.
Horizontal and vertical dimensions are now required to be a multiple of 16 in frame coded pictures, and the vertical dimension must be a multiple of 32 in field coded pictures.
Syntax can now signal frame sizes as large as 16383 x 16383.
Syntax signals source video type (NTSC, PAL, SECAM, MAC, component) to help post-processing and display.
Source video color primaries (609, 170M, 240M, D65, etc.) and opto- electronic transfer characteristics (709, 624-4M, 170M etc.) can be indicated.
Four scalable modes [see scalable section below]
All MPEG-2 motion vectors are half-pel accuracy.
DC precision can be user-selected as 8, 9, 10, or 11 bits.
Concealment motion vectors were added to I-pictures in order to increase robustness from bit errors since I pictures are the most critical and sensitive in a group of pictures.
A non-linear macroblock quantization factor that results in a more dynamic step size range, from 0.5 to 56, than in MPEG-1 (1 to 32).
New Intra-VLC table for dct_next_coefficient (AC run-level events) that is more geared towards I-frame probability distribution. EOB is 4 bits. The old tables are still included.
Alternate scanning pattern that (supposedly) improves entropy coding performance over the original Zig-Zag scan used in H.261, JPEG, and MPEG-1. The extra scanning pattern is geared towards interlaced video.
Syntax to signal 3:2 pulldown process (repeat_field_first flag)
Syntax flag to signal chrominance post processing type (4:2:0 to 4:2:2 upsampling conversion)
Progressive and interlaced frame coding
Syntax to signal source composite video characteristics useful in post-processing operations. (v-axis, field sequence, sub_carrier, phase, burst_amplitude, etc.)
Pan & scanning syntax that tells decoder how to, for example, window a 4:3 image within a wider 16:9 aspect ratio image. Vertical pan offset has 1/16th pixel accuracy.
Macroblock stuffing is now illegal in MPEG-2 (hurray!!)
Two line modes (interlaced and progressive) for DCT operation.
Now only one run-level escape code code (24-bits) instead of the single (20-bits) and double escape (28-bits) in MPEG-1.
Improved mismatch control in quantization over the original oddification method in MPEG-1. Now specifies adding or subtracting one to the 63rd AC coefficient depending on parity of summed quantized coefficients.
Many additional prediction modes (16x8 MC, field MC, Dual Prime) and, correspondingly, macroblock modes.
What are the scalable modes of MPEG-2 ?
Scalable video is permitted only in the Main+ and Next profiles. Currently, there are four scalable modes in the MPEG-2 toolkit. These modes break MPEG-2 video into different layers (base, middle, and high layers) mostly for purposes of prioritizing video data. For example, the high priority channel (bitstream) can be coded with a combination of extra error correction information and decreased bit error (i.e. higher Carrier-to-Noise ratio or signal strength) than the lower priority channel.
Another purpose of scalability is complexity division. For example, in HDTV, the high priority bitstream (720 x 480) can be decoded under noise conditions were the lower priority (1440 x 960) cannot. This is "graceful" degradation. By the same division however, a standard TV set need only decode the 720 x 480 channel, thus requiring a less expensive decoder than a TV set wishing to display 1440 x 960. This is simulcasting.
A brief summary of the MPEG-2 video scalability modes:
Useful in simulcasting, and for feasible software decoding of the lower resolution, base layer. This spatial domain method codes a base layer at lower sampling dimensions (i.e. "resolution") than the upper layers. The upsampled reconstructed lower (base) layers are then used as prediction for the higher layers.
Similar to JPEG's frequency progressive mode, only the slice layer indicates the maximum number of block transform coefficients contained in the particular bitstream (known as the "priority break point"). Data partitioning is a frequency domain method that breaks the block of 64 quantized transform coefficients into two bitstreams. The first, higher priority bitstream contains the more critical lower frequency coefficients and side informations (such as DC values, motion vectors). The second, lower priority bitstream carries higher frequency AC data.
Similar to the point transform in JPEG, SNR scalability is a spatial domain method where channels are coded at identical sample rates, but with differing picture quality (through quantization step sizes). The higher priority bitstream contains base layer data that can be added to a lower priority refinement layer to construct a higher quality picture.
A temporal domain method useful in, e.g., stereoscopic video. The first, higher priority bitstreams codes video at a lower frame rate, and the intermediate frames can be coded in a second bitstream using the first bitstream reconstruction as prediction. In stereoscopic vision, for example, the left video channel can be prediction from the right channel.
What is the TM rate control and adaptive quantization technique ?
Test model was not by any stretch of the imagination meant to be the show-stopping, best set of algorithm. It was designed to exercise the syntax, verify proposals, and test the *relative* performance of proposals in a way that could be duplicated by co-experimentors in a timely fashion. Otherwise there would be more endless debates about model interpretation than actual time spent in verification. [MPEG-2 Test model is frozen as v5b] The MPEG-2 Test Model (TM) rate control method offers a dramatic improvement to the Simulation Model (SM) method used for MPEG-1. TM's improvements are due to more sophistication pre-analysis and post-analysis routines. Rate control and adaptive quantization are divided into three steps:
In Complexity Estimation, the global complexity measures assign relative weights to each picture type. These weights (Xi, Xp, Xb) are reflected by the typical coded frame size of I, P, and B pictures (see typical frame size section). I pictures are assigned the largest weight since they have the greatest stability factor in an image sequence. B pictures are assigned the smallest weight since B data does not propagate into other frames through the prediction process.
Picture Target Setting allocates target bits for a frame based on the frame type and the remaining number of frames of that same type in the Group of Pictures (GOP).
Rate control attempts to adjust bit allocation if there is significant difference between the target bits (anticipated bits) and actual coded bits for a block of data.
Recomputes macroblock quantization factor according to activity of block against the normalized activity of the frame.
The effect of this step is to roughly assign a constant number of bits per macroblock (this results in more perceptually uniform picture quality).
MPEG is developing the MPEG-2 Video Standard, which specifies the coded bit stream for high-quality digital video. As a compatible extension, MPEG-2 Video builds on the completed MPEG-1 Video Standard (ISO/IEC IS 11172-2), by supporting interlaced video formats and a number of other advanced features, including features to support HDTV.
As a generic International Standard, MPEG-2 Video is being defined in terms of extensible profiles, each of which will support the features needed by an important class of applications. At the March MPEG meeting in Sydney, the MPEG-2 Main Profile was defined to support digital video transmission in the range of about 2 to 15 Mbits/sec over cable, satellite, and other broadcast channels, as well as for Digital Storage Media (DSM) and other communications applications. Building on this success at the New York meeting, MPEG experts from participating countries in Asia, Australia, Europe, and North America further defined parameters of the Main Profile and Simple Profile suitable for supporting HDTV formats.
MPEG experts also extended the features of the Main Profile by defining a hierarchical/scalable profile. This profile aims to support applications such as compatible terrestrial TV/HDTV, packet-network video systems, backward-compatibility with existing standards (MPEG-1 and H.261), and other applications for which multi-level coding is required. For example, such a system could give the consumer the option of using either a small portable receiver to decode standard definition TV, or a larger fixed receiver to decode HDTV from the same broadcast signal.
The technical definition of MPEG-2 Video has been completed. This was a critical milestone, and shows that MPEG-2 Video is on schedule for a Committee Draft in November 1993.
What are MPEG-2 VIDEO Main Profile and Main Level ?
MPEG-2 Video Main Level is analogous to MPEG-1's CPB, with sampling limits at CCIR-610 parameters (720 x 480 x 30 Hz). Profiles limit syntax (i.e. algorithms), whereas Levels limit parameters (sample rates, frame dimensions, coded bitrates, etc.). Together, Video Main Profile and Main Level (abbreviated as MP@ML) normalize complexity within feasible limits of 1994 VLSI technology (0.5 micron), yet still meet the needs of the majority of application users.
At what bitrates is MPEG-2 video optimal ?
The Test subgroup has defined a few examples :
How does MPEG video really compare to TV, VHS, laserdisc ?
VHS picture quality can be achieved for source film video at about 1 million bits per second (with proprietary encoding methods). It is very difficult to objectively compare MPEG to VHS. The response curve of VHS places -3 dB at around 2 MHz of analog luminance bandwidth (equivalent to 200 samples/line). VHS chroma is considerably less dense in the horizontal direction than MPEG source video (compare 80 samples/ line to 176!). From a sampling density perspective, VHS is superior only in the vertical direction (480 lines compared to 240)... but when taking into account interfield magnetic tape crosstalk and the TV monitor Kell factor, not by all that much. VHS is prone to timing errors (which can be improved with time base correctors), whereas digital video is fully discretized. Pre-recorded VHS is typically recorded at very high duplication speeds (5 to 15 times real time playback), which leads to further shortfalls for the format that has been with us since 1977.
Broadcast NTSC quality can be approximated at about 3 Mbit/sec, and PAL quality at about 4 Mbit/sec. Of course, sports sequences with complex spatial-temporal activity need more like 5 and 6 Mbit/sec, respectively.
Laserdisc is a tough one to compare. Disc is composite video (NTSC or PAL) with up to 425 TVL (or 567 samples/line) response. Thus it could be said laserdisc has 567 x 480 x 30 Hz "resolution". The carrier-to-noise ratio is typically better than 48 dB. Timing is excellent. Yet some of the clean characteristics of laserdisc can be achieved at 1.15 Mbit/sec (SIF rates), especially for those areas of medium detail (low spatial activity) in the presence of uniform motion. This is why some people say MPEG-1 video at 1.15 Mbit/sec looks almost as good as Laserdisc or Super VHS.
Regardless of the above figures, those clever proprietary encoding algorithms can push these bitrates even lower.
Why film does so well with MPEG ?
Several reasons, really:
What are some pre-processing enhancements ?
This method maps interlaced video from a higher sampling rate (e.g 720 x 480) into a lower rate, progressive format (352 x 240). The most basic algorithm measures the variance between two fields, and if the variance is small enough, uses an average of both fields to form a frame macroblock. Otherwise, a field area from one field (of the same parity) is selected. More clever algorithms are much more complex than this, and may involve median filtering, and multirate/ multidimensional tools.
A common method in still image coding is to pre-smooth the image before compression encoding. For example, if pre-analysis of a frame indicates that serious artifacts will arise if the picture were to be coded in the current condition, a pre-anti-aliasing filter can be applied. This can be as simple as having a smoothing severity proportional to the image activity. The pre-filter can be global (same smoothing factor for whole image) or locally adaptive. More complex methods will use multirate/multidimensional tools again.
Most detail is contained in the lower harmonics anyway. Sharp-cut off filters are not widely practiced, so the "320 x 480 potential" of VHS is never truly realized.
Why use "advanced" pre-filtering techniques ?
Think of the DCT and quantizer as an A/D converter. Think of the pre-filter as the required anti-alias prefilter found before every A/D. The big difference of course is that the DCT quantizer assigns a varying number of bits per sample (transform coefficient).
Judging on the normalized activity measured in the pre-analysis stage of video encoding, and the target buffer size status, you have a fairly good idea of how many bits can be spared for the target macroblock, for instance.
Other pre-filtering techniques mostly take into account: texture patterns, masking, edges, and motion activity. Many additional advanced techniques can be applied at different immediate layers of video encoding (picture, slice, macroblock, block, etc.).
What are some advanced encoding methods ?
[Thomson patent]
this is true for any syntax elements, really. Signalling a macroblock quantization factor or a large motion vector differential can cost more than making up the difference with extra quantized DFD (prediction error) bits. The optimum can be found with, for example, a Lagrangian process. In summary, any compression system with side information, there is a optimum point between signalling overhead (e.g. prediction) and prediction error.
Borrowing from the concept that the DCT is simply a filter bank, a technique that seems to be gaining popularity is basis vector shaping. Usually this is combined with the quantization stage since the two are tied closely together in a rate-distortion sense. The idea is to use the basis vector shaping as a cheap alternative to pre-filtering by combining the more desiderable data adaptive properties of pre-filtering/ pre-processing into the transformation process... yet still reconstruct a picture in the decoder using the standard IDCT that looks reasonably like the source. Some more clever schemes will apply windowing. [Warning: watch out for eigenimage/basis vector orthogonality. ]
Enhancements are applied after the DCT (and possibly quantization) stage to the transform coefficients. This borrows from the concept: if you don't like the (quantized) transformed results, simply reshape them into something you do like.
This method is similar to the original intent behind color subcarrier phase alternation by field in the NTSC analog TV standard: for stationary areas, noise does not hang" in one location, but dances about the image over time to give a more uniform effect. Distribution makes it more difficult for the eye to "catch on" to trouble spots (due to the latent temporal response curve of human vision). Simple encoder models tend to do this naturally but will not solve all situations.
Scene changes
(non-linear) Interpolation methods (Wu-Gersho) Convex hull projections Some ICASSP '93 papers, etc.
Post-processing makes judging decoder output for conformance testing near impossible.
MPEG is developing the MPEG-2 Audio Standard for low bitrate coding of multichannel audio. MPEG-2 Audio coding will supply up to five full bandwidth channels (left, right, center, and two surround channels), plus an additional low frequency enhancement channel, and/or up to seven commentary/multilingual channels. The MPEG-2 Audio Standard will also extend the stereo and mono coding of the MPEG-1 Audio Standard (ISO/IEC IS 11172-3) to half sampling-rates (16 kHz, 22.05 kHz, and 24 kHz), for improved quality for bitrates at or below 64 kbits/s, per channel.
MPEG produced an updated version of the MPEG-2 Audio Working Draft, and is on track for achieving a Committee Draft specification by the November MPEG meeting.
The MPEG-2 Audio multichannel coding Standard will provide backward-compatibility with the existing MPEG-1 Audio Standard (ISO/IEC IS 11172-3). Together with ITU-RS, MPEG is organizing formal subjective testing of the proposed MPEG-2 multichannel audio codecs and up to three non-backward-compatible (NBC) codecs. The NBC codecs are included in order to determine whether an NBC mode should be introduced as an addendum to the standard. If the results show clear evidence that an NBC mode improves the performance, a formal call for NBC proposals will be issued by MPEG, with a view to incorporate these features in the audio syntax.
MPEG-2 audio attempts to maintain as much compatibility with MPEG-1 audio syntax as possible, while adding discrete surround-sound channels to the original MPEG-1 limit of 2 channels (Left, Right or matrix center and difference). The main channels (Left, Right) in MPEG-2 audio will remain backwards compatible, whereas new coding methods and syntax will be used for the surround channels.
A total of 5.1 channels are included that consist of the two main channels (L,R), two side/rear, center, and a 100 Hz special effects channel (hence the ".1" in "5.1").
At this time, non-backwards compatible (NBC) schemes are being considered as an ammedment to the MPEG-2 audio standard. One such popular system is Dolby AC-3.
MPEG is developing the MPEG-2 Systems Standard to specify coding formats for multiplexing audio, video, and other data into a form suitable for transmission or storage. There are two data stream formats defined: the Transport Stream, which can carry multiple programs simultaneously, and which is optimized for use in applications where data loss may be likely, and the Program stream, which is optimized for multimedia applications, for performing systems processing in software, and for MPEG-1 compatibility.
Both streams are designed to support a large number of known and anticipated applications, and they retain a significant amount of flexibility such as may be required for such applications, while providing interoperability between different device implementations. The Transport Stream is well suited for transmission of digital television and video telephony over fiber, satellite, cable, ISDN, ATM, and other networks, and also for storage on digital video tape and other devices. It is expected to find widespread use for such applications in the very near future.
The Program Stream is similar to the MPEG-1 Systems standard (ISO/IEC 11172-1). It includes extensions to support new and future applications. Both the Transport Stream and Program Stream are built on a common Packetized Elementary Stream packet structure, facilitating common video and audio decoder implementations and stream type conversions. This is well-suited for use over a wide variety of networks with ATM/AAL and alternative transports. In New York, MPEG completed definitions of the features, syntax, and semantics of the Transport and Program Streams, enabling product designers to proceed. Among other items, the Transport Stream packet length was fixed at 188 bytes, including the 4-byte header. This length is suited for use with ATM networks, as well as a wide variety of other transmission and storage systems.
What about the Grand Alliance ?
The Grand Alliance was formed in May 1993 by seven organizations (AT&T, GI, MIT, Philips, Sarnoff, Thomson, Zenith) to evaluate technologies and to decide on key elements that will be at the heart of the best of the best HDTV system.
The video compression and transport technologies selected by the Grand Alliance are based on the proposed MPEG-2 standards. The scanning formats selected are focused primarily on computer-friendly progressive scanning, while offering and interlaced mode important to some broadcasters.
They have already agreed to use the MPEG-2 Video and Systems syntax, including B-pictures. Both interlaced (1440 x 960 x 30 Hz) and progressive (1280 x 720 x 60 Hz) modes will be supported. The Alliance must then settle upon a modulation (QAM, VSB, OFDM), convolution (MS or Viterbi), and error correction (RSPC, RSFC) specification.
The audio technology selected is a six-channel, compact-disc-quality digital surround sound system. The last major technical decision, the broadcast and cable transmission subsystem, is expected in early 1994 following testing of competing technologies.