The primary focus of MPEG encoding algorithms is 1) To create fast and efficient motion vector search techniques and 2) To find "good" encoding parameters that provide a balance of encoding speed, compression and quality.
Figure 2. Berkeley Parallel MPEG-1 Encoder Organization.
The Berkeley encoder accelerates the process by parallelizing in the temporal direction. The overall organization of the encoder is shown in Figure 2. The Master server is the overall coordinator. It allocates groups of frames to the Slave servers for encoding. The sharing of the files is achieved best, if the directories containing the video files are NFS mounted and accessible uniformly to all processors (alternatively they can be sent out though sockets by the Master). The various activities are managed by inter process communication through sockets. The Decode server addresses the issue described next.
The issue of whether we should use the original uncompressed frames as reference frames, or use the decoded frames (the "decompressed compressed" frames) as reference frames, has certain implications for performance and picture quality. The decoded frames are better reference frames since the decoder will not have access to the original frames. The Decode Server is used to decode compressed frames for other frames' references. However, the use of decoded frames severely worsens the parallelism of the algorithm. Imagine slave 1 is trying to encode a P-frame. The I-frame that this P-frame references is being encoded by slave 0. Then slave 1 has to wait for slave 0 to finish encoding that I-frame before it can be decoded again and used as a reference frame to encode its P-frame. In some situations, this effect can serialize the work of all the slaves.
However, the degradation in picture quality upon using the original frames as reference is hardly noticeable, thus the Decode server can often be eliminated. This also leads to a faster execution time for the encoder.
Upon successfully encoding their allocated frames, the Slave servers write the encoded data to individual files per frame. The file sizes are dependent on the raw image, but in general the I-frames are the largest in size followed by P- and B-frames. Finally, the Combine Server performs the task of combining these individual files created by the Slaves, into a file containing a contiguous MPEG sequence.
Scheduling.
There is a trade-off between load-balancing and scheduling overhead. Hence it is preferable to allocate a group of frames to a processor rather than a single frame. Also, if encoding requires reference to a frame that is not allocated to that processor, this frame must nevertheless be read by the processor. This penalty can be reduced by grouping the frames. In a parallel encoder, the problem of using decoded frames for reference is particularly expensive if the decode frame also requires to be passed to another processor. Pathological cases of such situations can serialize the entire encoder!
The Berkeley encoder uses a dynamic scheduling scheme, which first allocates a fixed number of frames to each processor, and measures their performance. Based on their performance each processor is allocated as many frames as necessary to achieve a approximately equal execution time for encoding. Performance (time) is measured for this allocation of frames, and if some imbalance is found, a new workload allocation is calculated for the next phase.
Other Choices.
The Berkeley algorithm also makes several choices in the search schemes, and turns other knobs that are specific to the MPEG application, affecting the quality of the picture etc. See the Users Guide for details (in Postscript or MIF).