How To Become An Expert In DSM-CC

Many people in the DTV industry have only a basic understanding of DSM-CC, and sometimes not even that. For MHP and OCAP developers, it's a technology that you need to know something about. It's not necessary to know all of the details, but a general idea of what goes on and why is essential to building applications that work well and load quickly.

The detailed parts of this tutorial are contained in separate sections, and are really only for people who are implementing MHP or OCAP receivers, or head-end systems that need to build object carousels for those receivers: other readers should not need to stray from the main page of this tutorial into the more technical discussions of specific parts of DSM-CC. While you may be curious about how DSM-CC actually works, I strongly suggest that you avoid finding out more unless you really have to or you're really curious.

There are three specifications that you need before the DSM-CC specification even begins to make sense. You don't need to read all of them, but they refer to one another and generally it's easier to have them all to hand when you're trying to work out what's going on. The important specs are:

The most useful of these is probably TR 101 202. While the title claims that it's only a set of implementation guidelines, it's actually the closest thing to a clear description of DSM-CC that you'll find anywhere. At 60 pages long, it's well worth the download.

Do not try to read the DSM-CC specification first - you'll only confuse yourself and make it harder for you to understand what the hell is going on. Reading TR101 202 while referring to EN 301 192 is the approach that seems to work best. At first, only read the DSM-CC specification to check details that are mentioned in either of the two other documents. It's far better to learn what you need to know first, and then extend that knowledge into other areas, than to try and learn it all at once.

Given that I'm a digital TV geek rather than a DSM-CC geek, this tutorial concentrates on DSM-CC from the point of view of its use in DVB specifications. This may seem a little unfair to all those OCAP developers out there until you realise that with a small number of exceptions, OCAP follows the DVB standards for using DSM-CC. The ATSC data broadcasting standard is not used by OCAP, because it's missing a number of features that are necessary for efficiently carrying Java class files and data files associated with applications.

So what is DSM-CC anyway?

DSM-CC is many things to many different parts of the industry. DSM-CC stands for Digital Storage Media - Command and Control, and was originally designed for use with networked VTR machines and such like. It has been extended rather since then, and now includes control of MPEG video servers over a network (playing, stopping and pausing video or audio), support for data transmission using MPEG-2, timecodes for MPEG-2 video, simple data broadcasting and broadcast file systems. There are several different parts to the DSM-CC specification, and many of these are not directly related to any of the MHP-derived standards.

For the purpose of this tutorial, we'll concentrate on those parts of the standard that are related to MHP, OCAP and JavaTV. What this means is that we will ignore large parts of the standard, simplify others and consider only those cases that are needed to understand how DSM-CC works in broadcasting MHP and OCAP compatible filesystems. Don't assume that what is described here is all there is to know about DSM-CC.

In the rest of this section, we'll take a high-level look at the various elements of DSM-CC that are of interest to us as digital TV developers. More details about these are given in other sections of this tutorial, but this is

Data Carousels and object carousels

One of the main differences between DSM-CC in a digital TV scenario and DSM-CC as it was originally intended to be used is the nature of the network. Originally, DSM-CC was intended to be implemented using some kind of RPC mechanism. In this model, objects would reside on one node in the network and any other node that needed to manipulate them would make an RPC call to the node containing the object.

Broadcast systems are by their very nature one-way, however. Data is sent from a transmitter (the digital TV head-end) to a receiver (the set-top box or IDTV) and the receiver can't request specific data from the transmitter. This means that a receiver can't request a specific file from the server, like a PC can request a file from the network or from its hard disk. Given that a receiver can't access data in the normal way, another solution has to be found.

The solution is actually quite a simple one - the broadcaster periodically transmits every file in the filesystem, and the receiver waits for the file it wants. File operations on the receiver tell it which files it should look for in the broadcast. The best example of this kind of solution is a teletext system: each page has a unique number, and each page is transmitted in turn. When the user enters a page number, the TV must wait for that page to be broadcast before it decodes and displays it. This type of solution is known as a carousel - every page goes round it turn, and the receiver must wait for that page to come round again in the broadcast before it can use it. Of course, this solution is not terribly efficient, and later in this tutorial we'll examine ways of improving performance.

In DSM-CC, data is transmitted in blocks called modules rather than in pages, but the principle is the same. The data to be transmitted is split up into modules, some description of that module is added and then each module is transmitted in turn.

DSM-CC supports two kinds of carousel. The most simple of these is a data carousel. This provides a way for a broadcaster to transmit blocks of data to the receiver. It makes no indication if what that data is, though, and it's completely up to the receiver to parse this data into some form that makes sense to it. The ATSC data broadcasting specification (and the Japanese ARIB specifications) make use of DSM-CC data carousels.

For more complex situations, this isn't very useful, and in these cases the object carousel provides a better solution. An object carousel is built on top of a data carousel and provides functionality close to (and in some cases better than) a standard filesystem. DVB uses the object carousel format, and this has also been adopted by OCAP and ACAP.

Each object carousel consists of a directory tree that is split into a series of modules, which may contain one or more files or directories. Each module may contain several files with a total size smaller than 64 KBytes - storing several files in a module larger than 64K is not allowed. Splitting files across more than one module is not allowed, so files larger than 64K must go in their own module, which will contain only that file. Files in a module can come from any part of the directory tree to be broadcast, and need not come from the same directory.

These modules are broadcast one after the other until they have all been broadcast, at which point the process starts from the beginning and the first module is broadcast again. In order to access a file, the receiver must wait until it receives the module containing that file, at which point it can parse the module and access the file. This may not be efficient when the total amount of data being broadcast is quite large, but most receivers will cache some data. This caching may either be done at the module level, or at the individual file level.

Both data carousels and object carousels are much more complex than we have discussed here, and more detailed guides to both of them are available in other parts of this site:

The discussion of object carousels assumes that you have already got a working knowledge of data carousels, and so you should read the data carousel tutorial first unless you're already familiar with what data carousels are and how they work.

An example of an object carousel

Suppose that we wish to broadcast the following directory structure:

           index.html                1256 bytes
           image1.jpg                4040 bytes
           image2.jpg                120346 bytes
           audio                     <directory>
           audio/clip1.aiff          26430 bytes
           classes                   <directory>
           classes/Main.class        23020 bytes
           classes/Big.class         59982 bytes
           classes/Other.class       26947 bytes 

To create the modules for broadcast, we start adding files to the module. Adding the first two files (index.html and image1.jpg) is no problem. However, adding image2.jpg would take the module above 64Kbytes in size, so we can't do that. However, adding the next file (clip1.aiff in the audio directory) is fine. Similarly, we can add the entry for the audio directory itself with no problems. We could also add some of the contents of the classes directory, but we won't. The reason for this is to optimize loading times, as we shall see in a moment. So, we end this module and start a new one.

The file image2.jpg is larger than 64K, but we can't split the file across more than one module and so it goes in its own module. That module is larger than 64K, but there's nothing we can do about that.

That leaves us with the contents of the classes directory to be added to the carousel. These files will not all fit in the same module, but we can organise them in a way that makes loading them quicker. The first thing that we add to the new module is the directory entry for the classes directory. Since we are also likely to need the directory entry to access the files within that directory, we add as many of those classes as possible to this module. In this case, that means the files Main.class and Other.class get added. As before, the final file (Big.class) will go in a module in its own.

This may not be the most efficient way of splitting the files across modules - that depends on when the files are needed and the relationships between them, so doing this for real would take some careful thought to get the most efficient carousel layout.

Putting files into modules in a DSM-CC object carousel.

alt="Putting files into modules in a DSM-CC object carousel">

As the above diagram shows, modules may be broadcast more than once when broadcasting a DSM-CC carousel. By broadcasting some modules more often than others, the access time for commonly used files can be reduced. Of course, by doing this, the total size of the carousel is increased and so the access time for less commonly used files may actually increase. This tradeoff has to be considered carefully when designing a carousel, to optimize the download speed. It is possible for a file to be repeated in several modules, and so this provides a finer-grained approach to optimizing an object carousel.

Carousel layout is still very much an art rather than a science, and a lot depends on what the data in the carousel is used for, when it is needed and how large the files are. There is no easy way to tell what the most efficient carousel layout is in every case - you have to actually know something about the design and structure of the application that will use the data in the carousel.


Transporting IP data over MPEG-2

It's also possible to use DSM-CC to carry data in a different way. Many types of content are available using an IP connection, and DSM-CC provides a way to carry this IP data over an MPEG-2 transport stream using a technique called Multi-Protocol Encapsulation.

Obviously, there are a lot of differences between the broadcast world where MPEG-2 is used and the internet world where IP is most common. For starters, not much IP content is pure broadcast (or multicast as it's called in the IP world - as most readers will already know, 'broadcast' has a specific meaning when you're talking about IP networking). This makes any discussion of multi-protocol encapsulation pretty long, even though the core concepts aren't that hard. For that reason, there's a separate tutorial section on multi-protocol encapsulation that goes into more detail about how it actually works.

Normal Play Time

As we mentioned at the start of this tutorial, DSM-CC is not just used for data broadcasting. It can also carry a timecode that lets the broadcaster tell the receiver what point in the current broadcast is currently being presented. This may not seem very useful - after all, MPEG already carries a PCR (Program Clock Reference) that acts as a timecode for the MPEG stream. Now, PCR is useful if you're an MPEG decoder, but if you're not then it's not much help to you. The DSM-CC timecode (called Normal Play Time, or NPT for short) is a more general time indication for a stream. It allows the receiver to do things that the PCR value can't do, and we'll see exactly what kinds of things those are in a moment.

Before we see that, though, we need to take a closer look at what DSM-CC NPT actually is. NPT is a timecode that is embedded in a special descriptor in an MPEG-2 private section. An NPT timecode may start at any value, and simply provides a known time reference for that piece of media. Although they typically increase throughout a single piece of media (if they are present), NPT values may have discontinuities either forwards or backwards. This means that even if a stream containing NPT is edited (either to be made shorter, or to have adverts inserted) then NPT values will not need updating and will remain the same for that piece of media.

A good way to think if NPT is to imagine the stream as a piece of movie film. Now imagine that every frame has a unique number (its NPT value), and the number of each frame is one higher than the previous frame. Obviously, you can refer to each frame by this number, but if you edit the film, the frame numbers can suddenly jump to a higher value where you cut some frames out of the film. If you stick those frames in somewhere else, then you will have another jump in the frame numbers. However, it means that if you refer to frame number 379, you're referring to the same frame wherever it happens to be after you finish editing.

This is illustrated better in the diagram below:

NPT continuity through the editing process.

NPT continuity through the editing process

In this case, the media is edited for content in two places, making it shorter (the blue sections of the original stream) and has adverts inserted, making it longer (the red section of the resulting stream). Despite this, the NPT values at each point in the final media are the same as they were in the original, unedited, version. This is a powerful advantage to using NPT.

One other advantage of NPT is that NPT values can be nested. In our film example, if you cut the film and insert another piece of film, from a different reel, into the gap then there is no way of telling that the frame numbers are from a different film. If they happen to overlap with frame numbers in the original piece, strange things may happen. In the case above, the numbers in red indicate NPT timecode values that are taken from the inserted adverts, not from the original material. In this case, the values 1 and 2 are used twice in the same clip. How can we know they are different?

To support this, NPT descriptors contain two main parts - the NPT value itself, and a content ID. This content ID is used to separate blocks of content which share an NPT reference. To see how this works, lets consider the example above. We'll assume for now that the content ID used for the NPT descriptors in the original media is 1. NPT values with this content ID are shown by the black digits in the edited media clip we show in the example above.

Let's suppose that when we want to insert adverts into this media clip, we also want the adverts to have an NPT timecode, but we want to distinguish this from the timecodes that are used for the main section of the media. What we can do is set the content ID for the NPT descriptors in the adverts to be a different value (we'll assume the value we choose is 2). As we've already mentioned, this is shown by the red digits in the edited media clip.

If we look at this from the point of view of the receiver, a change in the content ID lets us tell whether an NPT discontinuity is simply an edit in the original stream (because the content ID will be the same on both sides of the discontinuity) or whether it is actually a 'new' NPT timecode that is nested within another NPT timecode, or a boundary between two different pieces of media (e.g. different shows).

Although NPT is related to the MPEG-2 system time clock (STC), it is not the same thing. We've already seen that NPT values may have discontinuities in different places to any discontinuities in the STC, but this is not the only difference. Each NPT descriptor also includes a rate value, which tells the receiver how many STC 'ticks' correspond to each NPT 'tick'. This rate does not have to stay the same across an entire show, and may be fractional or even negative. In other words, the NPT value may stay the same or even decrease as the STC value increases. This gives us a great deal more flexibility in how we use NPT.

Stream Events

In addition to NPT timecodes, DSM-CC also allows other synchronization points in streamed media. These are called stream events (if you read the object carousel tutorial you will be familiar with these). These are descriptors that are embedded in a DSM-CC elementary stream contained within an MPEG-2 stream, and provide a way for the receiver to synchronize with specific points in the media. This allows receivers to identify specific points in the media without having to monitor the NPT. This is useful when the NPT value could change, or when the synchronization points may not occur at predictable NPT values.

For instance, stream events could be used to indicate the start of a new TV show - using stream events means that the broadcaster doesn't have to tell any applications that new shows start at NPT values X, Y and Z - instead, they can broadcast a 'new show' stream event and let the device receiving the stream take care of the rest.

Understanding stream events properly can take a little time; partly because of the way they are used, and partly because of the terrible documentation that exists for them. Hopefully, this section will help you with one of those two, if not the other.

From a DSM-CC perspective, stream events are split into two parts: stream event objects and stream event descriptors. The main effect this has is to confuse people who are reading about them, because it's not always clear what is being referred to in any documentation. In this case, we will stick to a consistent scheme. The table below shows the terminology that we are using when we discuss stream events.

Terms used when discussing DSM-CC stream events
Term Description
stream event The event that is triggered within the receiver when a stream event descriptor is received.
stream event object DSM-CC Stream Event objects as carried in a DSM-CC object carousel. These act as a high-level description of a stream event.
stream event descriptors The descriptors carried in a DSM-CC stream that actually trigger stream events.

DSM-CC stream event objects are stored in an object carousel, and are just like any other DSM-CC objects. A stream event object provides a general description of a stream event, which includes an event ID (which must be unique within that carousel) and a human-readable name. This allows a receiver to know what events can get generated, and helps it to check that an application is listening for stream events that will actually get used. In effect, this is like a class in object-oriented programming: it's a general description of a type of object, rather than something that represents a specific object.

The stream event descriptor is the second part of the puzzle. This tells the receiver that an event has actually been generated, and can be compared to an instance of a class if we extend our object-oriented programming analogy. The stream event descriptor contains real, specific values describing a single stream event, and more than one stream event descriptor can refer to the same description in a Stream Event object just like a class can have more than one instance.

A stream event descriptor contains three main attributes: the ID of the event, an NPT value at which the event should be generated and some application-specific data. The ID allows the receiver to work out which stream event object is associated with this descriptor. Since a broadcaster can't be sure exactly where a descriptor is inserted into an MPEG stream, each descriptor carries an NPT value which says when the event should be triggered. This allows the receiver to know in advance that it should generate an event when a specific NPT value is reached, giving a little more predictability. It also adds a little more reliability into the system, since the broadcaster can send several stream event descriptors with the same values in order to make sure at least one of them is decoded properly. The MHP specification says that for most stream events, they should be signaled at least once every second for a minimum of five seconds before the time they should trigger.

A stream event descriptor can also tell the system that the event should be triggered immediately - these are known as 'do it now' events. This allows stream events to be inserted in to a live show much more easily. For instance, a sports application may use stream events to detect when a goal has been scored in a soccer match, and which team has scored. We'll see the exact method which is used for this later in this section. Stream event descriptors containing 'do it now' events are only broadcast once per event, and so some care has to be taken by the receiver to make sure
that it receives them properly.

Whenever a stream event descriptor is received, the receiver takes the following steps:

  1. It checks to see that an event object with the same event ID is present in the default object carousel. If an event with that event ID is not present, then the descriptor is ignored.
  2. If the encoding of the descriptor shows that the event is a 'do it now' event, then the event is triggered immediately.
  3. If the event is not a 'do it now' event, the receiver checks the NPT value at which the event should be triggered. If an event with the same event ID is already scheduled to be triggered at the same NPT value, or if the NPT value has already passed, then the event descriptor is ignored.
  4. When the NPT value reaches the value specified for a scheduled event, the event is triggered.

One advantage that is offered by this separation of stream event objects and stream event descriptors is that events can be re-used. As we have already mentioned, several stream event descriptors can contain the same event ID, even if they are triggered at different times and contain different private data. This allows an application to use the event ID to define 'classes' of events. For instance, in our soccer example, events descriptors with one event ID may be used to indicate that a goal has been scored, while a different event ID may signify a penalty being awarded and a third event ID may signify the end of the match. Thus, an application can start processing an event just by knowing the event ID. In some cases, no other application-specific data is needed.

Normal play time and Stream Events

While the NPT timecode is completely separate from DSM-CC stream events and is carried in its own descriptor, DSM-CC stream events can use this timecode to help decide when to trigger events.

As we have seen, the time reference in a DSM-CC stream event is a reference to an NPT timecode value. If that time is skipped in an NPT discontinuity (i.e. that part of the content has been edited out), then the event triggers at the first NPT reference past the trigger time. An NPT signal is only required to be present in those cases where a stream event needs it - if all the stream events in a piece of media are of the 'do it now' kind (where the event is triggered immediately rather than waiting for a specific NPT value), then an NPT timecode is not necessary in the stream.

Further Reading

DSM-CC is a complex animal, and so you may want to do some more reading about DSM-CC in general. Two good references for this are:

If you're actually looking at implementing DSM-CC for an MHP or OCAP receiver (or any other DVB-compatible system), the following specifications describe DSM-CC and the restrictions that DVB and MHP put on it: