![]() |
---|
This is retired content. This content is outdated and is no longer being maintained. It is provided as a courtesy for individuals who are still using these technologies. This content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. |
Microsoft Corporation
June 2000
Summary:This paper discusses the multimedia streaming capabilities using Windows Media™ components included in Microsoft Windows CE DirectX Platform Adaptation Kit 1.2 (or DXPAK), and its differences from other versions of Microsoft Windows. (14 printed pages)
Overview
The digital revolution has taken the consumer electronics space
by storm. Digital cable set top boxes offering hundreds of channels
are quickly replacing older, analog cable boxes. Portable digital
audio players offer multiple hours of music playback at a much
higher sound quality and fraction of the size of a portable tape
player. The overall change in consumer electronics devices from
analog to digital is both a radically new phenomenon and a natural
evolution. Because of this evolution, it is now generally accepted
that the PC will no longer be the only source for digital
multimedia. Many of these new consumer devices entertain consumers by
playing digital music, movies, TV, or other multimedia content. The
amount of data needed to store digital content, even when
compressed using the best available coding algorithms such as
Microsoft Windows Media™ Audio, is very large. For example, when
compressed at 120 Kbits/sec, digital audio requires 3.6 MB to store
a four-minute song, while to store a two-hour movie, digital video
at 300 Kbits/sec uses 270 MB, and at 4 Mbits/sec would require 3.6
GB! Large media files can be managed by either:
Network streaming can also be used for processing media data
created on the server in real-time and never stored as a single
file. Sending the next relatively small amount of data needed for
immediate processing and playback (and not the complete data set)
works because multimedia content is a series of digital data
without strong long-term temporal coupling. In other words, a block
of sound values in a song or pixels in a video frame can be
processed and displayed independently, at least when they are
separated by sufficient time. This allows a stream of multimedia
data to be broken up into temporal groups of independent data.
These data are then encoded, transmitted, and played back in
temporal order independently of either proceeding or succeeding
group of data that will be displayed at different times. This
method of sending blocks of time-ordered, temporally decoupled data
to the client device is called streaming. The client device needs
only to buffer enough packets of data to allow for server, network,
or client side interruptions or irregularities in the creation,
transmission, and time correct display of the streamed data. How does Windows CE fit into this picture? By providing Windows
Media components, Windows CE DirectX Platform Adaptation Kit 1.2
(or DXPAK) enables many of the rich multimedia playback and
streaming capabilities found on PCs, but does so with smaller, more
configurable components. These components run on many of the
high-performance CPUs supported by Windows CE (x86, MIPS R4300 and
compatible, and SH4, available now with DXPAK 1.1; ARM, StrongARM,
and integer MIPS planned for DXPAK 1.2). In addition, the
modularity of Windows Media components for Windows CE gives you
flexibility in choosing which components your platform uses. When
building an operating system image for your hardware using Platform
Builder, you can decide whether you want a particular DirectX or
user interface component, communications protocol, or file system.
This kind of flexibility allows you to ship only those technologies
you are actually using on your platform, saving space and reducing
complexity. Windows CE 3.0 with the DirectX Platform Adaptation Kit 1.2 (or
DXPAK) provide a complete solution for developing the next "killer"
consumer appliance or application. It is a robust, powerful
real-time operating system that now provides a rich set of
components for enabling digital multimedia devices. The Microsoft DirectShow portion of Microsoft DirectX provides
the foundation for all multimedia services on Windows CE. It is
possible provide a rich multimedia application using solely
DirectShow, and in fact many companies are doing just that, but it
is not the only way to proceed. A communication structure has been
built on top of DirectShow to make the application developer's job
easier. How everything fits together is illustrated in the
following diagram:
Figure 1. Communication structure based on DirectShow
The user sees the top layer of this diagram: the web browser or
other application. The application either has the Window Media
Player (WMP) control embedded within it, or communicates via COM
with the DirectShow interfaces. In either case, DirectShow manages
the flow of data from the source to the hardware. The application
developer is most concerned with the second and third layers of
information flow. The driver developer is most concerned with the
bottom layers. What follows is a look at the sections the
application developer needs to understand: the WMP control, WMT,
and DirectShow. Recognizing the importance of multimedia to web content,
Microsoft created the Windows Media Player (WMP) control. This
technology enables the Windows Media Player to exist as a Microsoft
ActiveX control inside a web page along with other content. The WMP
control is a versatile tool for presenting local and streaming
multimedia files. It supports playback of nearly all major media
file formats, including Windows Media format files (ASF, ASX, WMA,
WMX, WMV, WVX, WM, and WMX files), Motion Pictures Experts Group
formats (MPG and MPE files), and audio formats such as MP3, MIDI,
WAV, and AIFF, and multimedia format AVI files. All these file
formats can be streamed from locally stored files using just the
WMP control, and, when combined with the Microsoft Windows Media
Technologies, streaming over networks is supported. DirectX
Platform Adaptation Kit 1.2 (or DXPAK) supports the Windows Media
Player 6.4 version of the controls. The
OBJECTtag is used to embed ActiveX objects into a Web page.
The following example shows how to use the
OBJECTtag to insert the Windows Media Player (WMP)
control. The
IDattribute of the
OBJECTtag specifies a name for the
WMPobject, for later use in scripting. The
CLASSIDattribute is required for Internet Explorer to create
the object on the page, and should always be the string listed in
the preceding example. The
TYPEattribute indicates to the browser that the type of
embedded
OBJECTis an ActiveX object. The optional
WIDTHand
HEIGHTparameters set the size of the window used for the
WMPobject. The
STYLEparameters enable you to position the object window
anywhere on the Web page. All WMP functionality is exposed to the Microsoft JScript web
scripting language. There is currently no support for any other
scripting language. Note that the
CODEBASEattribute is conspicuously absent from the
OBJECTtag. The
CODEBASEattribute contains a Uniform Resource Locator (URL)
pointing to a location where the WMP control can be downloaded if
it is unavailable on a user's system. This functionality is not
supported on Windows CE. The WMP control for Windows CE must be
included in your OS image if it is going to be used by your
application. The
PARAMtags have two attributes: The first is the name of the
property being set, and the second specifies the value of that
property. The
PARAMtags initialize the
WMPobject with specified data when it is created. In this
example, the first
PARAMtag sets the
FileNameproperty to the URL
http://example.microsoft.com/media/sample.asx, defining which file
the WMP control will play. The value could also be a path to a
local media file, such as C:\sample.asx. The remaining
PARAMtags specify that the playback controls and status bar
should both be visible for this object. Any of these elements could
be hidden instead, enabling you to customize the appearance of the
WMP control and user interface items. After you have created the object and specified a valid file
name, you should see the WMP control on your Web page. There are three other HTML tags that are used to include audio
and video in Web pages—the embed object tag
<EMBED>, the image tag
<IMG>, and the anchor tag
<A>. The
<EMBED>tag was created by Netscape to support browser
plug-ins. Netscape does not support embedding objects with the
<OBJECT>tag, so this tag should be used if you are
trying to maintain compatibility with Netscape browsers. Although
the
<IMG>tag can be used to include video clips in a Web
page, its use is limited to certain media types (MPEG, QT, and AVI
files.) The
<IMG>tag does not provide access to any of the WMP
control parameters and does not work at all with audio media. In
order to use the full functionality of the WMP control, the use of
the
<IMG>tag is not recommended. The
<A>tag can also be used to create links to media.
Media is played either by a helper application determined by the
media type or the browser. The
<A>tag cannot be used to embed media in a Web page as
with the
<OBJECT>or
<EMBED>tags. One key point to keep in mind is that the
WMP control is distinct from the Windows Media Player application.
As a result, not all Internet media content can be handled by the
WMP control. An example of this is content accessed with the
<A>tag. Normally, when a user is browsing the Internet
with a PC and clicks on a link to play a media file, the Windows
Media Player application appears on the desktop and control over
the media content is passed from the browser to it. If you are
trying to build a device or application that supports all existing
Internet media content, you must make sure the control recognizes
and correctly handles all the different ways to deliver content
over the Internet. The WMPHLPR sample included with DXPAK 1.2
provides an example of how to enable the WMP control to playback
media accessed via the
<A>tag. Using this sample, when a user clicks on a
media file link, the browser navigates to a page that hosts the
control. The media file is passed to the control and playback
begins from within the browser. This behavior is quite useful for
set-top boxes, Internet portal devices, or any other device with
either limited memory or a desire to run completely within a single
window. The Windows Media Player control for Windows CE contains a few
differences from the version of the control that is available for
x86-based PCs. The driving force behind including just a subset of
the desktop WMP control's features is to provide a smaller, robust
control that encapsulates the key features of the WMP control
required by embedded devices. In addition, the WMP control for
Windows CE also supports a subset of the properties, methods, and
events from the desktop control. Some of these properties, methods,
and events have not been included in the CE version of the WMP
control, as they have no practical value for non-PC devices such as
set-top boxes and audio jukeboxes. There is no support for backward
compatibility with the Microsoft NetShow player control, since all
of the NetShow functionality has been encapsulated in the WMP
control. Certain UI elements such as the context menu, Display
panel, Closed Captioning panel, and Go To Bar are not supported,
but can be authored for a Web page with scripting. The Microsoft PowerPoint (PPT) streaming or hotspots ASF
authoring features are not supported. With URL flipping, it is
still possible to have the WMP control playing media in one frame
while displaying slides or other graphics in another frame.
Clickable hotspots can turn images or video clips into hyperlinks
or script locations, and can also be implemented with the proper
usage of URL scripting commands embedded at certain times in an ASF
file. Window Media Technologies (WMT) is a set of COM interfaces and
codecs that support a broad range of server and client applications
that stream audio, video, and script commands as a continuous flow
of data. Today, Windows CE 3.0 with the DirectX Platform Adaptation Kit
1.2 (or DXPAK) provides Windows Media Technologies version 4.1
components. These components support client playback using advanced
Windows Media formats and world-class codecs, such as Windows Media
Audio, Microsoft MPEG-4 video, and Sipro ACELP.net low bit-rate
speech. What follows is a look at the formats, features, protocols, and
codecs supported by WMT on Windows CE. Occasionally, the Windows CE
implementation of WMT differs from other versions of Windows. When
this is the case, the differences and their workarounds are
discussed. To store and stream data, WMT uses the Advanced Streaming Format
(ASF). ASF is an application-level multimedia transmission file
format (as opposed to a wire or transmission control format) for
arranging and organizing synchronized multimedia data. ASF supports
media data delivery over a wide variety of networks, network
bandwidths, and protocols. It is optimized for streaming multimedia
packets over both low bit-rate and broadband networks. Windows CE also supports the Advanced Stream Redirector v3 (ASX)
and Windows Media Station (NSC) metafiles. The ASX metafile
provides mechanisms by which a client can support hyperlinks to
streams, support specification of multiple pieces of source
content, and the protocol rollover rules the client will use to
process them, as well as support for media playlists. The Microsoft Windows Media Station metafile serves to describe
a particular channel to an ASF client wishing to access that
channel. The model for access to a channel is similar to a
television accessing a broadcast channel. This metafile is used for
multicasting support. Windows CE provides WMT client DirectShow filters that allow
playback of ASF streams sent using UDP, TCP, and HTTP protocols (as
described in the next section,
Windows Media Protocols.
Windows CE WMT supports smart streaming using a multi-data rate
encoded ASF file, where multiple streams with different bit rates
are created in one ASF file and the client negotiates with the
server for the appropriate stream. The server then automatically
adjusts the stream depending on playback conditions and can select
from multiple video streams based on available network
bandwidth. With smart streaming, the Windows CE WMT client can dynamically
thin the stream based on the available bandwidth using an algorithm
that adjusts delivery smoothly from full frames down to key-frame
only. If necessary, the WMT client can ask the server to send only
audio and no video packets. As bandwidth is reduced, audio is
always given the highest priority, since it is usually critical to
the user experience. As network bandwidth conditions improve, WMT
can progressively step the video bit-rate back up to restore the
viewing to an optimal level. In addition, the WMT UDP resend
capability allows the client, if time is available, to request
missing packets from the server. Finally, WMT also provides ASX
event-driven stream switching where the client sends ASX control
commands to the server. Windows CE WMT does not support older ASX v2.0 or v1.0 formats.
All of the functionality of these earlier versions has been
encapsulated into ASX v3.0. In addition, the
PREVIEWDURATION,
BANNER, or
LOGO ASXelements are not supported for Windows CE. Preview
mode can be implemented within an application using the WMP control
by providing access to playback control via scripting. The
functionality of the
BANNERand
LOGOelements can be implemented using DHTML and
scripting. Windows Media Technologies for Windows CE provides support for
Authentication. Authentication involves user validation before any
information exchange takes place. When a client initiates a request
to the server that has authentication enabled, the server
challenges the client to confirm its identity. Typically, this
amounts to inspecting the name and password of the user account
under various authentication protocols. For any given interaction,
both client and server must adhere to one agreed protocol. The WMT
supports the following two protocols, HTTP-Basic for Internet
applications and NTLM which is suitable for Intranet
applications. On the desktop NTLM uses authentication information established
when the user logs on, it requires the client and server to be on
the same or trusted domains. Since Windows CE does not allow a user
to log in, the WMT pops up a dialog box to extract the
authentication information when NTLM authentication is
required. The following protocols are supported by the WMT: multicasting,
local file streaming, HTTP streaming, and MMS streaming. Multicast enables the client to receive multicast streams. It
allows the administrator to send one copy of the content to many
users on the network, as long as that network is multicast-enabled.
IP Multicast streaming is done through ASF with the Microsoft
Windows Media Station Metafile. Networks that are not
multicast-enabled and ASF files not being streamed from a Windows
Media server are sent through unicast.
Unicastmeans that one stream is sent for every request. WMT can provide local file streaming for systems with persistent
storage. Data is read from persistent storage into a buffer in main
memory and rendered. Local file streaming provides lower latency
and a significant physical memory savings over reading the entire
ASF file from the persistent store into main physical memory before
rendering the file. MMS is Microsoft's proprietary protocol for streaming media. A
typical MMS session uses a TCP connection for sending and receiving
media control commands, and a UDP or TCP connection for streaming
the data. Invoking the MMS protocol using mms:// invokes the
protocol rollover mechanism. The client first tries to receive the
stream through UDP. If UDP does not work, the stream automatically
rolls over to TCP transmission. Finally, if TCP does not work, the
client will try to receive the stream through HTTP. MMSU enables
the client to receive streams through UDP. It is well suited to
audio because it sends packets regardless of connection quality.
Therefore, users hear fewer delays or pauses. If time allows,
missed packets are requested and resent. MMST enables the client to
receive streams through TCP. TCP forms a reliable stream—if packets
are lost, the stream stops and lost packets are recovered. Users
experience more delays and pauses over a network that is congested
when using MMST. A regular HTTP server can be used to deliver ASF data streams,
but there are several reasons to use the Windows Media Server
instead. The packets within an ASF data stream must be delivered
sequentially, one per network packet, for the full benefit of data
streaming to be realized. Only an ASF-compatible server, such as
Windows Media Server, will avoid fragmentation by transmitting ASF
packets one at a time, encapsulated neatly within individual
Internet or other network protocol packets. The error correction,
streaming playback, and bit-rate optimization inherent to ASF
depend on the client and server not having to figure out where ASF
data packets begin and end on the fly. An HTTP server doesn't have
this ability because it doesn't recognize the significance of ASF
packets; it just shoves data to the client as quickly as possible
by filling each network packet with an arbitrary amount of data.
Additionally, several features of Windows Media, such as the
ability to fast-forward or rewind ASF data streams, are not
available on a regular Web server. The following table lists the supported codecs that can be
contained within an ASF file. Windows CE WMT will only support
content that is created with the Windows Media Tools. The Windows
Media Encoder uses templates to encode live source or AVI, WAV, or
MP3 content into ASF formats with the codecs listed in the table
below. The templates also provide the option of using other codecs,
but DirectShow for Windows CE only supports WMT codecs. While other
codecs that are supported by DirectShow for Windows CE (such as
Cinepak or MPEG-1) can be created within an ASF file using other
authoring tools, there can be no guarantee as to their streaming
performance, and their use is not recommended. The componentization of the WMT for Windows CE allows you to
build a fully customizable streaming media client that is tailored
to your specific streaming environment. The WMT for CE has been
fragmented such that you can decide which components to include in
your application. Each of the following components can be selected
as appropriate:
DirectShow provides the underlying services for playback of
multimedia streams from either local files or over a network from a
server. Specifically, DirectShow enables playback of video and
audio content compressed in various file and streaming formats,
including Windows Media, MPEG, Audio-Video Interleaved (AVI), and
WAV. Applications control filter graph activities by communicating
with the filter graph manager. You can do this either indirectly by
using the Microsoft Windows Media Player control, or directly by
calling COM interface methods. At the heart of the DirectShow services are modular sets of
pluggable components called filters that can be arranged depending
upon media type into a connected configuration called a filter
graph. Filters operate on data streams to read, parse, decode,
format, or render them. Filters are arranged in a configuration called a filter graph,
controlled by the Filter Graph Manager (FGM). A DirectShow filter
graph (see Figure 2) consists of a directed sequence of filters
from source to final renderers, all connected by input and output
filter pins. Filter pins negotiate which media types they will
support. The FGM controls the multimedia data flow between the
graph filters. Because DirectShow has a flexible, re-configurable
filter graph architecture, DirectShow can support playback and
streaming of many media types using the same software components.
Developers can also extend DirectShow multimedia support by writing
their own filters.
Figure 2. DirectShow Filter Graph
An application uses the Filter Graph Manager (FGM) interfaces to
create, connect, and control filter graphs. Filters use the FGM
interfaces to post event notifications and to force reconnection of
the filter pins as needed. In particular, the
IGraphBuilderinterface allows applications to call the
filter graph manager to attempt to build a complete filter graph,
or a partial filter graph if given only partial information such as
the name of a file or the interfaces of two separate pins. The
filter mapper looks up the available filters in the registry to
configure the filter graph in a meaningful way. The
IGraphBuilderinterface creates a filter graph, adds filters
to or removes filters from a filter graph, enumerates all the
filters in a filter graph, and forces connections when adding a
filter. To cause the appropriate filter graph to be constructed, an
application just needs to create an instance of the
IGraphBuilderinterface and then call its
RenderFilemethod. In addition, the FGM exposes media control and media positioning
interfaces to the application. The media control interface,
IMediaControl, allows the application to issue commands to
run, pause, and stop the stream. Playback starts when the Run
method is invoked. The positioning interface,
IMediaSeeking, lets the application specify which section of
the stream to play. Internally, the FGM will use the individual filter's well-known
BaseFilterinterface to locate and enumerate a filter's input
and output pins. Filters are registered DirectShow classes and perform most media
processing tasks. Filter tasks include:
Filters use several types of interfaces, such as pins,
enumerators, transports, and clock interfaces to perform their
tasks. Filters implement and expose numerous interfaces. The FGM
uses these interfaces to create, connect, and control the graph. A
filter will always implement the
IBaseFilterinterface that contains methods to:
Individual filters expose an
IBaseFilterinterface so that the Filter Graph Manager can
issue the run, pause, and stop commands. The Filter Graph Manager
is responsible for calling these methods in the correct order on
all the filters in the filter graph. Your application should not do
this directly. However, unlike the
IBaseFilterinterface, only the renderer filter exposes an
IMediaSeekinginterface. Therefore, the Filter Graph Manager
calls only the renderer filter with positioning information. The
renderer then passes this position control information upstream
through
IMediaSeekinginterfaces exposed on the pins, which simply
pass it on. The positioning of the media stream is actually handled
by the output pin on the filter that is able to seek to a
particular position, usually a parser filter such as the AVI
splitter. Windows CE DXPAK 1.2 provides the following DX 6.1 DirectShow
filters:
In order to support streaming of Windows Media Formats, special
ASF/ASX streamer source and WMA codec transform filters are
provided. In addition, the Fraunhofer MP3 audio and Sipro ACELP.net
speech codecs use the Audio Compression Manager (ACM) wrapper Audio
Decompressor transform filter. DirectShow broadcast technology and DV filters are not included
in DXPAK, but are available as part of the Windows CE WebTV
Microsoft TV (MSTV) Kit. DirectShow makes it easy to play or stream multimedia files.
Here is a sample code fragment showing how to write a trivial
multimedia file player application (note that we have, among other
simplifications, suppressed checking the
QueryInterfacereturn status). Several comments are in order.
CoCreateInstanceinstantiates a filter graph object, but no
filters, as it does not yet know what media types it needs for
playback. It returns the
IGraphBuilderinterface needed to build the filter graph once
the media type is known. A query interface is made to get
IMediaControlfor running, pausing, and stopping the
streaming of media through its filters. Since Windows CE currently
supports only in-process COM servers, CLSCTX_INPROC_SERVER is the
only valid server context for
CoCreateInstance. Trying anything else will return
E_NOTIMPL.
IGraphBuilderis used to create a filter graph, add filters
to or remove filters from a filter graph, enumerate all the filters
in a filter graph, and force connections when adding a filter. We
are using its
RenderFilemethod to build the graph. The final graph
construction depends upon the video and audio formats contained in
source file. Finally, we can play back the file using
IMediaController::Run. Since we want the application to wait
until the rendering is finished, we have added
IMediaEvent::WaitforCompletion. You can find additional information about Windows CE DXPAK at
http://www.microsoft.com/presspass/press/2000/Feb00/DxpackPR.asp.
Technical Fundamentals
Windows Media Player Control
Windows Media Technologies
Windows Media Formats
Windows Media Features
Windows Media Protocols
Windows Media Codecs
DirectShow
Filter Graph Manager
Filters
For More Information
Overview
—or—
Technical Fundamentals
Windows Media Player Control
Placing the WMP Control in a Web Page
<OBJECT ID="MediaPlayer"
CLASSID="CLSID:22d6f312-b0f6-11d0-94ab-0080c74c7e95"
TYPE="application/x-oleobject" WIDTH="320" HEIGHT="240"
STYLE="position:absolute; left:50px; top:50px;" > <PARAM
NAME="FileName"
VALUE="http://example.microsoft.com/media/sample.asf"> <PARAM
NAME="ShowControls" VALUE="1"> <PARAM NAME="ShowStatusBar"
VALUE="1"> </OBJECT>
Windows Media
Technologies
Windows Media Formats
Windows Media Features
Windows Media Protocols
Windows Media Codecs
Codec name
Description
MPEG-4 v3, v2
MS MPEG-4 video codec; up to 30 fps QCIF (176x144)
- CIF (352x288) resolution video at 28.8 kbps – 300 kbps
WMAudio v2
New Windows Media audio codec based on non-uniform
modulated lapped bioorthogonal transforms (NMLBT) in place of DCT
for perceptual coding of both voice and high-fidelity; 8 – 48 kHz
stereo at 56 – 128 kbps; near-FM quality at 28.8 kbps and near-CD
quality at 64 kbps
ACELP.net
Sipro ACELP voice codec; speech-quality 8 – 16 kHZ
mono at 5 – 16 kbps
MPEG-1 Layer 3
Fraunhofer MP3 perceptual audio codec; near-CD
quality at 128 kbps
DirectShow
Filter Graph Manager
Filters
Supported Filters
Example
HRESULT PlayMovie(LPTSTR lpszMovie) { // we will
use several DirectShow interfaces IMediaControl *pMC = NULL;
IGraphBuilder *pGB = NULL; IMediaEventEx *pME = NULL; long evCode;
// something to hold a returned event code // instantiate a filter
graph as in-proc server hr = CoCreateInstance(CLSID_FilterGraph,
NULL, CLSCTX_INPROC, IID_IGraphBuilder, (void **) &pGB); //
we'll use this interface to build the graph hr =
pGB->QueryInterface(IID_MediaControl, (void **) &pMC); //
we'll want to wait for completion of the rendering, so we need a
media event interface hr =
pMC->QueryInterface(IID_IMediaEventEx, (void **) &pME); //
now we're ready to build the filter graph based on the source file
data types hr = pGB->RenderFile(lpszMovie, NULL); // play the
source file hr = pMC->Run(); // block application until video
rendering operations finish hr =
pME->WaitForCompletion(INFINITE, &evCode); // release
interfaces }
For More Information