Streaming Media Middleware is more than Streaming Media

Lawrence A. Rowe

Computer Science Division - EECS
University of California
Berkeley, CA 94720-1776, USA
+1 (510) 642-5117
Rowe@BMRC.Berkeley.EDU
http://www.bmrc.berkeley.edu/~larry

Abstract

Middleware for streaming media requires services other than media capture, encoding/decoding, network transmission, and presentation. Specifically most streaming media applications are distributed applications so they require the services being developed to support client/server and peer-to-peer applications. They also require multicast application services such as soft-state announce/listen protocols, reliable multicast protocols, and publish/subscribe multicast protocols. Some applications require dynamic user-interface definition and support for multimedia authoring and media processing.

Keywords

Streaming media, multicast applications, Internet webcasting.

Contents

Introduction

Streaming media middleware is more than code to support streaming media, that is, routines to capture, encode, decode, and play media, to frame, send, and receive network packets, to convert different media representations, and to play continuous media. While these abstractions are important, most streaming media applications require many more services. Example applications are distributed collaboration (e.g., video conferencing, multiple-player games, IP telephony, etc.), Internet webcasting, and multimedia authoring. Some of the services required by these applications are general-purpose distributed programming services (e.g., directory, process management, and communication services) and some services are specific to multimedia applications (e.g., media recording, indexing, playback, and editing).

This paper discusses the middleware services used in an Internet webcasting production system being developed at U.C. Berkeley and uses this system to illustrate the middleware services needed for a particular class of distributed streaming media applications.

Section 2 briefly describes the architecture of the webcast production system. The next section lists the middleware services required to implement this application and discusses other services that are needed to support the development of multimedia applications.

Internet Webcasting

The Berkeley Multimedia, Interfaces, and Graphics (MIG) Seminar is a regularly scheduled seminar that has been webcast worldwide on the Internet since January 1995. The seminar and webcast are a test bed for research on webcasting. In the early days the webcast was composed of one audio and one video stream transmitted on the Internet Mbone using the Mbone tools (i.e., sdr, vat, and vic). A manually controlled camera, which was aimed at the speaker or projector output (e.g., overhead transparencies, computer-based presentation, or VCR), and a wireless mic were used to capture the seminar. The webcast was produced by running the Mbone tools on a computer located in the classroom.

Many changes have been made over the last six years to improve webcast quality, increase the audience, and reduce the cost and effort required to produce the webcast. Quality improvements include: 1) webcasting two streams (e.g., speaker and content), 2) adding more cameras and mics to the classroom (e.g., audience camera and mics, wide-angle stage camera, etc.), 3) incorporating video effects processing and stored material playback, and 4) experimenting with remote speakers and questions.

We added multiple transmissions of the webcast to allow more viewers to watch the seminar and to experiment with higher quality webcasts (i.e., higher bit rate). Over the years we have had a difficult time delivering the webcast to viewers who were not at Universities or industrial research laboratories because few commercial organizations support multicast on their internal networks. Moreover, the transition from the DVMRP-based Mbone to a PIM-based Mbone further reduced the audience. To solve both problems, we added a Real Networks (RN) transmission. Although RN provides a lower quality webcast, it is designed to work through security firewalls and run on any platform. The development of experimental high-speed networks such as Internet2 allowed us to produce higher quality versions of the seminar that required more bandwidth. This past semester three transmissions were produced: 1) a low-bit rate Mbone webcast (200 Kbs), 2) a medium-bit rate Mbone webcast (800 Kbs), and 3) a RN webcast (50/100/250 Kbs multi-rate). During the last seminar of the semester we experimented with a new production TV quality streaming system, called RTPtv [9], which sends full-sized 60 field/second 4:2:2 video streams encoded using MJPEG and stereo 16 KHz Linear 16 audio streams which required an aggregate bandwidth of 15 Mbs.

In the early days, the seminar averaged 35-50 remote viewers. These viewers were synchronous, meaning that they watched the seminar as it happened. During the transition from DVMRP to PIM, the number of remote viewers dropped to less than ten. The addition of the RN transmission and the development of Internet2 raised the average number of viewers to over 150. However, most viewers watch the webcast asynchronously.1

1 This result mirrors what happened with the lecture webcasting system we developed at Berkeley [22]. During this past semester fifteen classes were webcast, including several large undergraduate classes, and over 19,000 lectures were played each month. Approximately 90% of the lectures were played asynchronously with the heaviest usage just before examinations.

Over the years we have developed software and systems to reduce the time and cost of producing a webcast and to improve the production (e.g., media quality and interaction). Some examples are:

Figure 1 shows the webcast production system architecture. The Si processes represent different a/v sources for the webcast (e.g., cameras, etc.). The Studio Mbone network cloud represents two multicast sessions (i.e., audio and video) used to send streams between the webcast production processes. The Webcast Mbone network cloud represents the session with the streams that will be webcast. The Studio and Webcast Mbone streams use high bandwidth media formats that are convenient for production processing. Lines with a single shaft imply one stream. Lines with double shafts imply multiple streams (e.g., all Studio Mbone streams are read by dc/vd). And lastly, lines with dashed shafts represent unicast transmissions that might use UDP, TCP, or HTTP as supported by the RN transport protocol. The figure illustrates a webcast with three transmissions: 1) low bit rate Mbone, 2) high bit rate Mbone, and 3) RN. tgw copies packets from the Webcast Mbone to the appropriate destination network and modifies them if necessary. In practice, dc/vd output the webcast into the Internet2 Mbone and tgw joins that session to receive the webcast streams.

[Figure 1]

Figure 1. Webcast Production System Architecture

This figure shows the general architecture, but the system is composed of many processes that run on different computers. For example, figure 2 shows only a part of the system relating to the Director's Console and two capture computers in a studio classroom. The shaded box is an analog routing switch to which all audio and video devices in the room are connected. A separate embedded computer, manufactured by AMX, controls the switch.

[Figure 2]

Figure 2. Process Architecture for a Subset of the Webcast Production System

The AMX control computer has an RS232 interface that allows the webcasting system to control a/v equipment. The AMX Server process provides an RPC interface so other processes can execute equipment commands. The rvc and vat processes capture and transmit video and audio streams, respectively. The dc/vd process represents the Director's Console and the Virtual Director. The Virtual Director is actually composed of several processes that interact with dc and other webcast processes. The SDS Server manages the session discovery service used by the webcasting system to locate and connect to webcast production services.

Lastly, the Broadcast Manager (bmgr) is run when a webcast is initiated. Webcast configurations (e.g., what processes should be started on what machines) are stored in a database. The processes are launched either when a scheduled webcast is to be initiated or in response to a request entered by a producer/director.

Different line styles are used in the figure to denote different actions. The line styles are:

Notice that all processes except the AMX and SDS Servers are launched by the bmgr. These two processes are continuously running. They listen to well-known IP addresses for requests to execute operations.2

2 The AMX Server should be configured as a launch on demand process using inetd or a similar service, but the TclDP tools we use for RPC do not support this mechanism [20]. And, the AMX Server was the first service developed for the webcasting system. It is used by several different applications (e.g., the location-based services for mobile clients work [10]) so we have to maintain the current implementation.

An example will clarify the implementation of the system. Suppose a producer/director wants to start a webcast. He or she runs the bmgr, selects a webcast configuration, and launches the processes. In the example above, the dc/vd, rvc, and vat processes are launched on the specified hosts. The rvc and vat processes register with the Service Discovery Service (SDS). dc/vd contacts the SDS Server in response to commands entered by the director to locate the desired webcast services. We use a dynamic service discovery protocol rather than the configuration database because different webcasts produced from the same room(s) might use different services.

The SDS Server responds to dc/vd with the information needed to contact the service process (i.e., host, port, and protocol). Each service returns data and code required to control the service in response to an open service request from dc/vd. In particular, media services return a Tcl command string that will instantiate interface widgets into a window allocated by dc/vd to provide the director interactive controls for the service. For example, the interface widget for rvc above includes a pulldown menu to change the video source by issuing a command to the routing switcher.

This section described various features of the webcast production system being developed and used at U.C. Berkeley. The next section discusses middleware services required to implement the system.

Middleware Services

The middleware services used to implement the webcast production system include many services other than the services required to capture, encode/decode, transmit, store/fetch, and process media. These services can be decomposed into three categories: 1) distributed programming, 2) user-interfaces, and 3) media services. This section describes each category in turn.

Distributed Programming

Many conventional distributed programming services are required for distributed streaming media applications. These services include at least the following.

Some of these services are traditional distributed programming services, but an important feature of the webcast production system is the dynamic nature of both processes and services. We are continuously improving the system, which means prototyping new features and testing them in the production system. It is essential that we be able to define new classes, objects, and methods dynamically, that is, while the system is running. Moreover, the use of a scripting language, like Tcl, has been a significant research advantage. It is relatively easy to experiment with new specification languages and to implement and test new services and features. While the primary concern of many distributed programming systems is performance, the webcasting system, like all streaming media applications, requires some components to be efficient (e.g., media coding/decoding) and some components to be flexible and easy to implement (e.g., rule-driven control interfaces).

User Interfaces

The user interfaces used in the webcast production system are essentially client/server interfaces. The original Mbone tools (i.e., sdr, vat, vic, etc.) on which much of the system is dependent (i.e., Mash [17] and the Open Mash Consortium [19]) bundled an interface with code to capture, encode, transmit, receive, decode, and display streaming media. The development of distributed collaboration systems, such as the Access Grid [1], Virtual Rooms Videoconferencing System [29], and the Berkeley Webcast Production System, has lead to the recognition that client/server tools that can be managed either by a GUI interface or a remote program are needed. Consequently, the development of client/server tools is an important change being made by the Open Mash Consortium.

The development of client/server interfaces means that a client interface must be configured dynamically to provide the controls needed by the remote service. As described above for dc/vd, a client program must be able to configure the interface dynamically when given widget definitions and function bindings by the service. The prototyping nature of the research work implies that statically compiling all interface abstractions into the client applications is impractical. Scripting languages and interpretive languages (e.g., Visual Basic, Java, and Lisp) are natural candidates for such development. However, the toolkits and run-time environments must support dynamic definition and instantiation of classes and objects.

Media Services

The theme of this paper is that streaming media middleware is more than streaming media. Nevertheless, streaming media services beyond those provided by typical toolkits are required. The following lists some examples of desired services.

It is easy to see that much work remains to provide appropriate middleware tools for distributed streaming media applications.

Summary

This paper argues that distributed streaming media applications require many services other than the media capture, coding/decoding, sending/receiving, and playback services found in a conventional streaming media toolkit. Too often the focus in these toolkits is to support the most popular media codecs and relatively simple applications (e.g., on-demand playback). Most interactive, distributed streaming media applications require significant support for distributed computing. And, multimedia applications that involve media editing and processing (e.g., content analysis and query) are poorly served by current toolkits.

References

  1. Access Grid, June 2001. URL: http://www.accessgrid.org.
  2. E. Amir, S. McCanne, and H. Zhang, An Application-level Video Gateway, Proceedings of The Third Annual ACM International Multimedia Conference, November 1995.
  3. E. Amir, S. McCanne, and R. Katz, An Active Service Framework and its Application to Real-time Multimedia Transcoding, Proceedings of ACM SIGCOMM '98, Vancouver, British Columbia, September 1998.
  4. M. Atkinson, F. Bancilhon, D. DeWitt, K. Dittrich, D. Maier, and S. Zdonik. The Object-oriented Database System Manifesto, Proc. of the First Int. Conf. on Deductive and Object-Oriented Databases, Kyoto, Japan, 1990.
  5. K. Birman, et.al., Middleware Support for Distributed Multimedia and Collaborative Computing, Software - Practice and Experience, Vol. 29, No. 14, 1999, pp 1285-1312.
  6. J. Boreczky, et.al., An Interactive Comic Book Presentation for Exploring Video, Human Factors in Computing Systems ACM SIGCHI Proceedings, The Hague, Holland, April 2000.
  7. G.J. Brown and M. Cooke, Computational Auditory Scene Analysis, Computer Speech and Language, Vol. 8, August 1994, pp 297-336.
  8. Foveal Systems, AutoAuditorium, November 1999. URL: http://www.autoauditorium.com/.
  9. M. Delco, Production Quality Television over the Internet, Technical Report, Berkeley Multimedia Research Center, U.C. Berkeley, June 2001. URL: http://bmrc.berkeley.edu/papers/2001/161/.
  10. T.D. Hodes and R.H. Katz, Composable Ad hoc Location-based Services for Heterogeneous Mobile Clients, ACM Wireless Networks Journal, special issue on mobile computing: selected papers from MobiCom '97, Vol. 5, No. 5, October 1999, pp. 411-427.
  11. Q. Liu, et.al., Automating Camera Management for Lecture Room Environments, Human Factors in Computing Systems ACM SIGCHI Proceedings, Seattle, WA, March-April 2001.
  12. E. Machnicki, AMXd Server, Web Page, Berkeley Multimedia Research Center, U.C. Berkeley, September 2000. URL: http://bmrc.berkeley.edu/~machnick/amxd/index.html.
  13. E. Machnicki and L.A. Rowe, Virtual Director: Automating a Webcast, to appear Multimedia Computing and Networking 2002, Proc. IS&T/SPIE Symposium on Electronic Imaging: Science & Technology, January 2002.
  14. R. Malpani and L.A. Rowe, Floor Control for Large-Scale MBone Seminars, Proceedings of The Fifth Annual ACM International Multimedia Conference, Seattle, WA, November 1997, pp 155-163.
  15. K. Mayer-Patel, A Parallel Software-Only Video Effects Processing System, Ph.D. Dissertation, Computer Science Division - EECS, U.C. Berkeley, 1999.
  16. K. Mayer-Patel and L.A. Rowe, A Multicast Control Scheme For Parallel Software-only Video Effects Processing, Proceedings of The Seventh Annual ACM International Multimedia Conference, October 1999.
  17. S. McCanne, et.al., Toward a Common Infrastructure for Multimedia-Networking Middleware, Proc. Seventh Intl. Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV '97), St. Louis, Missouri, May 1997.
  18. S. Mukhopadhyay and B. Smith, Passive Capture and Structuring of Lectures, Proceedings of The Seventh Annual ACM International Multimedia Conference, October 1999.
  19. Open Mash Consortium, March 1999. URL: http://www.openmash.org/.
  20. M. Perham, B.C. Smith, T. Jánosi, I. Lam, Redesigning Tcl-DP, Proceedings of the Fifth Annual Tcl/Tk Workshop, Boston MA, July 1997.
  21. S. Raman and S. McCanne, Scalable Data Naming for Application Level Framing in Reliable Multicast, Proceedings of The Sixth Annual ACM International Multimedia Conference, Bristol, UK, September 1998.
  22. L.A. Rowe, et.al., BIBS: A Lecture Webcasting System, Technical Report, Berkeley Multimedia Research Center, U.C. Berkeley, June 2001. URL: http://bmrc.berkeley.edu/papers/bibs-report.html.
  23. A. Schuett, Active Services for Archive Applications, Ph.D. Dissertation, Computer Science Division - EECS, U.C. Berkeley, 2000.
  24. H. Schulzrinne, A. Rao, and R. Lanphier, Real Time Streaming Protocol (RTSP), RFC 2326, Internet Engineering Task Force, April 1998. URL: http://www.ietf.org/rfc/rfc2326.txt?number=2326.
  25. R. Sessions, COM and DCOM: Microsoft's Vision for Distributed Objects. John Wiley & Sons, New York, NY, 1997.
  26. L. Teodosio and W. Bender. Salient video stills: Content and context preserved. Proceedings of The First ACM International Conference on Multimedia, Anaheim, CA, August 1993.
  27. D. Wu, A. Swan, and L.A. Rowe, An Internet MBone Broadcast Management System, Multimedia Computing and Networking 1999, Proc. IS&T/SPIE Symposium on Electronic Imaging: Science & Technology, San Jose, CA, January 1999.
  28. S. Vinoski, CORBA: Integrating Diverse Applications Within Distributed Heterogeneous Environments, IEEE Communications Magazine, Vol. 14, No. 2, February 1997.
  29. Virtual Rooms Videoconferencing System, June 2001. URL: http://www.vrvs.org/About.
  30. Tai-Ping Yu, D. Wu, K. Meyer-Patel, and L.A. Rowe, dc: A Live Webcast Control System, Multimedia Computing and Networking 2001, Proc. IS&T/SPIE Symposium on Electronic Imaging: Science & Technology, San Jose, CA, January 2001.
  31. H. J. Zhang, C. Y. Low, S. W. Smoliar and J. H. Wu, Video Parsing, Retrieval and Browsing: an Integrated and Content-based Solution, Proceedings of The Third Annual ACM International Multimedia Conference, San Francisco, November 1995

Copyright © 2001 ACM.