\documentclass[11pt]{article} \usepackage{fullpage} \usepackage{palatino} \usepackage{psfig} \title{The MPEG Library\\ Version 1.2\footnote{Also covers version 1.3}} \author{Greg Ward\\({\tt greg@bic.mni.mcgill.ca})} \date{February, 1996} %\renewcommand{\familydefault}{cmss} %\renewcommand{\bfdefault}{b} % make ``bold'' come out medium-width % \code - for typesetting a little bit of computer code (in typewriter font) \newcommand{\code}[1]{\texttt{#1}} % ttdescription - a description-like environment where item descriptions are % typeset using "\tt", followed by a colon \newcommand{\ttlabel}[1]{\texttt{#1:}\quad\hfil} \newenvironment{ttdescription}[1] {\newbox\holder \setbox\holder=\hbox{\ttlabel#1} \dimen0=\wd\holder \begin{list}{} {\labelsep=-0.25in \rightmargin=0.25in \leftmargin=\dimen0 \addtolength{\leftmargin}{0.25in} \labelwidth=\leftmargin \let\makelabel\ttlabel}}% this comment is needed to hide the newline {\end{list}} % \prototype - for typesetting a function prototype \newcommand{\prototype}[1]{% \textbf{Function prototype:}\par \smallskip \code{#1}\par\medskip } \newenvironment{Arguments}[1]{% \noindent\textbf{Arguments:}% \begin{ttdescription}{#1}} {\end{ttdescription}\medskip} \newenvironment{Notes}{% \noindent\textbf{Notes:}\par\smallskip} {\medskip} \begin{document} \maketitle \tableofcontents \section{Introduction and Background} The MPEG Library is based on an effort from the University of California at Berkeley to create a portable, software-based MPEG decoder \cite{Patel93}. This resulted in the widely distributed (and widely modified) \code{mpeg\_play}, a highly-optimized MPEG decoder that was specifically geared towards displaying under X Windows. The value of having a portable, software MPEG decoder is amply demonstrated by the number of programs that have been adapted from this original Berkeley source (including ports to the Linux SVGA library, Silicon Graphics hardware, and a non-display MPEG information utility). However, the utility of the decoder was limited by the difficulty of extracting the useful, MPEG-related source code from the X11-specific, display-related source. Essentially what was needed was a simple interface that would allow a programmer to extract frames from an MPEG stream (either before or after converting to RGB colour space), and then to do with the image data as he or she saw fit. The MPEG Library is intended to fill this need. It was developed at the Montreal Neurological Institute in the summer of 1994 in order to facilitate the development of a high-performance, feature-heavy MPEG player for Silicon Graphics workstations. Since then, the Library has found a use in numerous applications, notably as one of several optional libraries used for extending the well-known ImageMagick suite of graphics applications. \section{Programming with the MPEG Library} Using the Library is quite straightforward, and is analogous to the way in which files have been traditionally handled: you open an MPEG stream to initialize internal data structures, and then read frames until the stream is exhausted. At any point, you can rewind the stream to start over; however, random access is not allowed. (This is not due to a fundamental weakness with MPEG; however, due to the nature of the decoding engine at the heart of the MPEG Library, don't expect to see it implemented here any time soon.) When you are finished with the stream, you close it to clean up. Here is a simple example program to open an MPEG stream (named by the first command-line argument) and read all frames from it. Since displaying images is as non-portable as it is desirable, I have included calls to dummy routines \code{InitializeDisplay()} and \code{ShowFrame()}; actually defining these is up to you. \begin{verbatim} #include #include "mpeg.h" int main (int argc, char *argv[]) { FILE *mpeg; ImageDesc img; Boolean moreframes = TRUE; char *pixels; mpeg = fopen (argv[1], "r"); SetMPEGOption (MPEG_DITHER, FULL_COLOR_DITHER); OpenMPEG (mpeg, &img); InitializeDisplay (img.Width, img.Height); pixels = (char *) malloc (img.Size); while (moreframes) { moreframes = GetMPEGFrame (pixels); DisplayFrame (img.Width, img.Height, pixels) } CloseMPEG (); fclose (mpeg); } \end{verbatim} For a concrete example, you might wish to consult \code{easympeg.c}, a very simple SGI-specific MPEG player included with the Library. Also, I have omitted any error-checking or handling here; again, consult \code{easympeg.c} for a more realistic example.\footnote{For an even more realistic (but of course considerably larger) example, take a look at \code{glmpeg\_play}. This is the full-featured MPEG player that was the impetus for creating the MPEG Library; it is available by from the same location as the library itself: \code{ftp://ftp.bic.mni.mcgill.ca/pub/mpeg}.} Note in particular the following points about the above code: \begin{itemize} \item The caller must take care of opening and closing the file containing the MPEG stream; the Library assumes that it is passed a file ready for reading. \item The \code{ImageDesc} structure contains all the information that should be needed to display frames from the MPEG stream (although not necessarily all the information you could possibly want to know about an MPEG stream). %In particular, the fields \code{Height} and %\code{Width} are the height and width of each frame in pixels; %\code{Depth} is the depth (in bits) of each pixel; \code{PixelSize} is %the actual number of bits stored per pixel; \code{Size} is the size in %bytes of each entire decoded and uncompressed frame; and %\code{BitmapPad} gives the ``quantum'' of a scan line. (Each scan %line starts on an even multiple of this many bits.) \item \code{SetMPEGOption()} can be used to control somewhat the decoding of frames. In addition to selecting a dithering mode, you can also select the luminance and chrominance ranges used for dithering. Also, note that \code{SetMPEGOption()} should be called {\em before} \code{OpenMPEG()} when setting the dithering method. \item The MPEG data can be decoded using a variety of dithering methods. (Note that in this context, {\em dithering\/} refers to converting from the luminance-chromaticity, or YCrCb, colour space in which MPEG data is encoded, to the more conventional RGB scheme.) \item You don't need to pass any parameters to \code{GetMPEGFrame()} or \code{CloseMPEG()} to tell it which MPEG stream you mean; this is because the Berkeley decoding engine (and hence the MPEG Library itself) depends heavily on global variables, and unfortunately cannot decode more than one MPEG at a time. \end{itemize} %Most of these %involve both pixel values (which are filled in by calls to %\code{GetMPEGFrame()} and a colour map which maps pixel values to RGB %values. The scheme used here, which is the default, uses %\code{FULL\_COLOR\_DITHER} to return pixels as RGB triples; no colour %map is involved. The other useful dithering modes are %\code{ORDERED\_DITHER} and \code{GRAY\_DITHER}, both of which offer %better performance and consume less memory at the expense of image %quality. More detailed information is provided in sections below. \section{Concepts and Data Formats} This section deals with the main concepts needed to control the MPEG Library and to display the data it returns. It does {\em not\/} deal with the details of how MPEG streams are encoded, stored, or decoded. \subsection{Dithering modes} \label{sec:dithering} A large number of dithering modes (in fact, all the modes provided by the original \code{mpeg\_play}) are available. A few produce nonsensical results, but all have been fully tested in the context of the MPEG Library and found to agree with the results given by \code{mpeg\_play}. ``Dithering'' in this context is the conversion from luminance-chrominance colour space (aka YCrCb, YIQ, or YUV, which is how MPEG streams are encoded and is the same space used by NTSC television signals) to some form of RGB space. The implementors of \code{mpeg\_play} found that outright conversion to red/green/blue values takes both more time and memory than any other method they experimented with, so most modes are colour mapped. This means that \code{OpenMPEG()} will create a colour map which can be accessed by the user via the \code{ColormapEntry} pointer in \code{ImageDesc}, and that the pixel values returned by \code{GetMPEGFrame()} are indeces into this colour map. The dithering mode affects the quality of the decoded images, the number of bits used per pixel, and the colour depth of the image. The dithering mode is selected with \code{SetMPEGOption()}, using the \code{MPEG\_DITHER} option and one of the following values: \begin{ttdescription}{FULL\_COLOR\_DITHER} \item[ORDERED\_DITHER] 8-bit colour-mapped; reasonable quality; decoding is almost as fast as \code{GRAY\_DITHER} \item[ORDERED2\_DITHER] 8-bit colour-mapped; reasonable quality \item[MBORDERED\_DITHER] 8-bit colour-mapped; reasonable quality \item[FS4\_DITHER] 8-bit colour-mapped; colours are all wrong \item[FS2\_DITHER] 8-bit colour-mapped; colours are all wrong \item[FS2FAST\_DITHER] 8-bit colour mapped using Floyd-Steinberg error diffusion; reasonable quality \item[HYBRID\_DITHER] 8-bit colour-mapped; passable colour \item[HYBRID2\_DITHER] 8-bit colour-mapped; slightly worse than \code{HYBRID\_DITHER} \item[Twox2\_DITHER] 8-bit colour-mapped with pixels doubled; poor quality \item[GRAY\_DITHER] a 256-shade grayscale rendering; nice quality and fastest decoding \item[FULL\_COLOR\_DITHER] a high-quality 24-bit colour rendering; results in slowest decoding \item[MONO\_DITHER] 1-bit monochrome dithering; use as last resort for 1-bit displays \item[THRESHOLD\_DITHER] ?? \end{ttdescription} The descriptions here are my entirely subjective judgments of the image quality with each dithering mode. ``Reasonable'' quality is better than ``passable.'' Your mileage may vary. ``8-bit'' or ``24-bit'' here refers to the colour depth in the final images, i.e. the minimum number of bits allocated to each pixel. Authoritative information about the actual pixel size can be found in the \code{ImageDesc} structure filled in by \code{OpenMPEG()}; for instance, if you select \code{FULL\_COLOR\_DITHER}, the colour depth is 24 bits, but 32 bits are allocated per pixel. Thus, the \code{PixelSize} field in \code{ImageDesc} will be 32, and the \code{Depth} field will be 24. Note that the dithering mode must be set {\em before\/} \code{OpenMPEG()} is called. For example, to select gray-scale dithering and then open the file \code{example.mpg} as an MPEG stream: \begin{verbatim} char filename[] = "example.mpg"; FILE *mpeg; ImageDesc image; mpeg = fopen (filename, "r"); SetMPEGOption (MPEG_DITHER, (int) GRAY_DITHER); OpenMPEG (&image); \end{verbatim} \subsection{Colour maps} \label{sec:colour_maps} Most dithering modes result in images whose pixel values are indeces to an 8-bit colour map. This colour map is accessed via the \code{ImageDesc} structure, and it is created by \code{OpenMPEG()} based on the dithering type selected by \code{SetMPEGOption()} (this is why the dithering type must be set before calling \code{OpenMPEG()}). Note that ``8-bit'' refers to the size of the colour map: each pixel in the colour-mapped images is 8 bits long, so the colour map itself has 256 entries. The colour map is accessed via the \code{Colormap} field of \code{ImageDesc}, which points to an array of \code{ColormapSize} colour map entries. Each colour map entry is a structure of the form \begin{verbatim} typedef struct { short red, green, blue; } ColormapEntry; \end{verbatim} and the colour map is created when \code{OpenMPEG()} is called. If no colour map is created (i.e., the dithering mode is \code{FULL\_COLOR\_DITHER}), then \code{ColormapSize} will be -1 and \code{Colormap} will be \code{NULL}. For example: \begin{verbatim} char *filename; FILE *MPEG; ImageDesc MPEGInfo; filename = argv[1]; /* Prepare to read and decode an MPEG stream */ MPEG = fopen (filename, "rb"); if (!OpenMPEG (MPEG, &MPEGInfo)) exit; /* Do we have a colour-mapped mode? */ if (MPEGInfo.Colormap != NULL) { for (i = 0; i < MPEGInfo.ColormapSize; i++) { mapcolor (i, MPEGInfo.Colormap[i].red, MPEGInfo.Colormap[i].green, MPEGInfo.Colormap[i].blue); } } /* ... */ \end{verbatim} Here, we assume that the function \code{mapcolor()} is available to set the system colour map. % NOTE to self: check how mpeg_play generates colour maps again; % after all, its pixel values are 8 bits too, so it can't be using % a huge colour map to keep from clobbering X's. Problem basically % is that grayscale dithering uses a 256-elt. colour map, which clobberss % the system map on the 4D35's -- on Indy's too? portia? %Actually, the situation is slightly more complicated: if it were literally %like this, then programs that use the MPEG Library would be stuck with %using entries 0-255 (or 0-127, depending on the dithering mode) of the %system colour map. On windowed systems such as X Windows, this is often %undesirable, as it might clobber colour map entries used by the system or %other programs. Thus, the MPEG Library provides a mechanism for mapping %``ideal'' colour map indeces (which are in the range 0..127 or 255) to %real-world colour map indeces. This \subsection{Image data format} \label{sec:image_data} The image data, as returned by \code{GetMPEGFrame()}, is formatted in a straightforward way. Pixels are stored in row-major order, starting at the upper left-hand corner of the image. The number of bits allocated per pixel is given by the \code{PixelSize} field of \code{ImageDesc}. This is illustrated in Figure~\ref{fig:image_format}, which shows a sample $8 \times 10$-pixel image, with the offset into the image data for each pixel. If the pixels are 8 bits each, then this will be a simple byte offset. \begin{figure}[htbp] \centerline{\psfig{figure=image_format.eps}} \caption{Illustration of image data layout for a sample $8 \times 10$ image. The number at each pixel is just the offset into the image data array.} \label{fig:image_format} \end{figure} \section{Programming Reference} \subsection{The \code{ImageDesc} structure} Relevant declarations: \begin{verbatim} typedef struct { unsigned char red, green, blue; } ColormapEntry; typedef struct { int Height; /* in pixels */ int Width; int Depth; /* image depth (bits) */ int PixelSize; /* bits actually stored per pixel */ int Size; /* bytes for whole image */ int BitmapPad; /* "quantum" of a scanline -- */ /* each scanline starts on an even */ /* interval of this many bits */ int ColormapSize; ColormapEntry *Colormap; /* an array of ColormapSize entries */ } ImageDesc; \end{verbatim} This structure provides (hopefully) all the information needed to display an MPEG stream, although it doesn't necessarily provide all the information you could possibly want to know about such a stream. However, that's not the intent of the MPEG Library; if you really need to know, for instance, just how many intra-frames are in a particular MPEG, you might want to take a look at the \code{mpegstat} program, which was also based on the Berkeley X11 player.% \footnote{\code{mpegstat} should also be available at \code{ftp://ftp.bic.mni.mcgill.ca/pub/mpeg}.} Here is the list of fields in the structure: \begin{ttdescription}{ColormapSize} \item[Height] the height, in pixels, of the movie. \item[Width] the width, in pixels, of the movie. Note that due to the block nature of MPEG encoding, the height and width will always be multiples of 16. \item[Depth] the number of bits per pixel that are actually relevant to the display. For most dithering methods, this will be 8 (i.e., we usually use an 8-bit colour map); for full-colour dithering, it will be 24. \item[PixelSize] the number of bits (not bytes!) of storage allocated per pixel. \item[Size] the size, in bytes, of one entire unencoded frame. This is simply equal to \code{Height*Width*PixelSize/8}. (Note: currently, \code{BitmapPad} is ignored in the calculation of \code{Size}.) \item[BitmapPad] the ``quantum'' of a scan line; i.e., each scan line starts on an even interval of this many bits. \item[ColormapSize] the number of entries in the colour map. This is usually 128, but for most dithering methods it can be indirectly modified by the user of the Library. It is zero in non-colourmapped modes. \item[Colormap] the table used to map pixel values to red/\-green/\-blue values (which are themselves stored as bytes in the \code{ColormapEntry} structure. It is \code{NULL} in non-colourmapped modes. \end{ttdescription} \subsection{\code{SetMPEGOption()}} \prototype{void SetMPEGOption (MPEGOptionEnum Option, int Value)} \noindent\code{Option} should be one of: \begin{ttdescription}{MPEG\_CMAP\_INDEX} \item[MPEG\_DITHER] Sets the dithering mode, which controls how YCrCb values are converted to RGB space. \code{Value} should be a \code{DitherEnum} value, cast to \code{int}. Dithering modes are explained above, in section~\ref{sec:dithering}. \item[MPEG\_LUM\_RANGE] \item[MPEG\_CR\_RANGE] \item[MPEG\_CB\_RANGE] These set the ranges of luminance and chromaticity values. The defaults are 8, 4, and 4. (I do not understand the effects of changing these; my experiments indicate that doing so garbles perfectly good colour maps.) % \item[MPEG\_CMAP\_INDEX] This allows you to set the mapping of % ``ideal'' pixel values (i.e., indeces into the colour map returned % by \code{OpenMPEG()} in the \code{Color\-map} field of the % \code{ImageDesc} structure) to ``real-world'' pixel values, or % indeces into the system colour map that you were able to allocate. % See section~\ref{sec:colour_maps} for more information on % colour-mapped modes. The \code{Value} passed to % \code{SetMPEGOption()} should be a pointer to an array of % \code{unsigned char} with \code{ColormapSize} entries (cast to % \code{int}, of course). The contents of this array will be copied % into a private Library array, so you don't need to worry about % keeping it around after calling \code{SetMPEGOption()}. \end{ttdescription} \begin{Notes} \code{SetMPEGOption()} allows you to set a variety of options related to MPEG decoding. The possible values for \code{Option} are described above; the possible values for \code{Value} value are dependent on what \code{Option} you are setting. Whatever \code{Value} is, it should of course be cast to an \code{int}. \end{Notes} \subsection{\code{OpenMPEG()}} \prototype{Boolean OpenMPEG (FILE *MPEGfile, ImageDesc *Image)} \begin{Arguments}{MPEGfile} \item[MPEGfile] A file that is already open for reading, positioned at the beginning of an MPEG stream. \item[Image] Pointer to a user-declared \code{ImageDesc} structure. You shouldn't change any of the fields in \code{*Image} yourself, either before or after calling \code{OpenMPEG()}; use \code{SetMPEGOption()} instead. \end{Arguments} \begin{Notes} \code{OpenMPEG()} prepares an MPEG stream for decoding. It initializes internal data structures for decoding and dithering and---if applicable---creates a colour map. After calling \code{OpenMPEG()}, the following fields in \code{*Image} will be set: \code{Height}, \code{Width}, \code{Depth}, \code{PixelSize}, \code{Size}, \code{BitmapPad}, \code{ColormapSize}, and \code{Colormap}. \end{Notes} \subsection{\code{GetMPEGFrame()}} \prototype{Boolean GetMPEGFrame (char *Frame)} \begin{Arguments}{Frame} \item[Frame] Pointer to a user-allocated chunk of memory. Must have enough room for the decoded image, which can be determined from the \code{Size} field of \code{ImageDesc}. \end{Arguments} \begin{Notes} Decodes the next frame from the movie. Returns \code{TRUE} if there are any frames left to decode in the movie, or \code{FALSE} if the decoded frame is the last frame in the movie. That is, for a movie with $N$ frames \code{GetMPEGFrame()} will return \code{TRUE} $N-1$ times, and then the call to decode the last frame will return \code{FALSE}. After that, the behaviour of \code{GetMPEGFrame()} is undefined (unless you call \code{RewindMPEG()}.) \end{Notes} \subsection{\code{RewindMPEG()}} \prototype{void RewindMPEG (FILE *MPEGfile, ImageDesc *Image)} \begin{Arguments}{MPEGfile} \item[MPEGfile] The open, readable stream pointer that was also passed to \code{OpenMPEG()}. \item[Image] The image descriptor that was passed to and filled in by \code{OpenMPEG()}. \end{Arguments} \begin{Notes} Repositions \code{MPEGfile}'s file-offset pointer to point to the beginning of the stream, and reinitializes internal MPEG Library structures to prepare reading the MPEG again. The first call to \code{GetMPEGFrame()} after calling \code{RewindMPEG()} will return the first frame of the movie, as though \code{OpenMPEG()} had just been called. \end{Notes} \section*{Acknowledgements} Most of the credit for this package should go to the authors of \code{mpeg\_play}; all I can really take credit for is shuffling code around and coming up with a reasonably intelligent interface to the decoding engine that does the real work. Thanks to Magnus Heldestat for contributing a sped-up version of 24bit.c, resulting in faster decoding/dithering of full-colour images. \begin{thebibliography}{0} \bibitem{LeGall91} Didier LeGall, ``MPEG--A Video Compression Standard for Multimedia Applications,'' {\em Communications of the ACM\/}, April 1991, Vol 34 Num 4, pp. 46--58. \bibitem{Patel93} Ketan Patel, Brian C. Smith, and Lawrence A. Rowe, ``Performace of a Software MPEG Video Decoder'', {\em ACM Multimedia '93 Conference}. \end{thebibliography} \end{document}