Synchronizing With Media

If an application is associated with a specific TV show, it's often useful to be able to synchronize the behaviour of application to the action on screen. In many multimedia authoring systems, this is based around the concept of a timeline where the these changes in behaviour take place at a specific time.

The problem with digital TV services, as we saw in the tutorial on media control, is that there is no real concept of media time when talking about a broadcast MPEG-2 stream. Since we can join the stream at any point, we have no way of knowing how long it is since the stream started (or even what the concept of 'starting' means in a broadcast sense).

This means that we can't use media time as a mechanism for synchronizing applications to their associated media. Instead, we use some extra capabilities defined by DSM-CC to help achieve the synchronization that we need.

DSM-CC stream events are markers that are embedded in a transport stream via MPEG-2 private sections, with each marker consisting of an identifier and a time reference. The identifier allows each stream event to be uniquely identified, while the time reference indicates at what point in the stream the event should trigger.

As well as the stream events themselves, the transport stream will also contain an object carousel which contains a set of stream event objects. These identify each event by a textual name, and allow the mapping of this name to the numeric identifier contained in the event itself.

If you've been paying attention, you'll have noticed an inconsistency here.A couple of paragraphs ago we said that the concept of 'media time' is not well defined in a broadcast scenario, and there is actually no way of mapping any kind of media time to a broadcast media clip in MHP. So how can we get a time reference that allows us to determine when an event should trigger?

Normal play time

The thing that lets us do this is DSM-CC Normal Play Time (NPT) . An MPEG stream may include an NPT reference, which is a timecode that is embedded in a special descriptor in an MPEG-2 private section. While this timecode is completely separate from DSM-CC stream events, and is carried in its own descriptor, DSM-CC stream events can use this timecode to help decide when to trigger events.

An NPT timecode may start at any value, and simply provides a known time reference for that piece of media. Although they typically increase throughout a single piece of media (if they are present), NPT values may have discontinuities either forwards or backwards. This means that even if a stream containing NPT is edited (either to be made shorter, or to have adverts inserted) then the NPT value for a given point in the stream may not need updating and may remain the same for that piece of media. This is illustrated better in the diagram below:

NPT continuity through the editing process.

NPT continuity through the editing process

In this case, the media is edited for content in two places, making it shorter (the blue sections of the original stream) and has adverts inserted, making it longer (the red section of the resulting stream). Despite this, the NPT values at each point in the final media are the same as they were in the original, unedited, version. This is a powerful advantage to using NPT.

NPT is carried in descriptors in the MPEG stream. NPT reference descriptors tell the receiver about NPT values at a specific point int he stream, while NPT endpoint descriptors tell the receiver about the start and end times for a particular event. Each NPT reference descriptor consists of three main parts:

  • An NPT value, which is the exact value for the Normal Play Time at that point in the stream
  • An NPT rate, stored as a numerator and a denominator (NPT rates are fractional values) this lets the receiver interpolate NPT values between one NPT reference descriptor and the next.
  • An NPT ID, that lets us identify nested pieces of content.

As we can see in t he exam[le above, nesting NPT values lets us handle some cases that are specific to broadcast media, such as ad insertion. Assume that the NPT ID used in the original media is 1. NPT values with this content ID are shown by the black digits in the edited media clip. Let's suppose that when we want to insert adverts into this media clip, we also want the adverts to have an NPT timecode, but we want to distinguish this from the timecodes that are used for the main section of the media. What we can do is set the content ID for the NPT descriptors in the adverts to be a different value (we'll assume the value we choose is 2). This is shown by the red digits in the edited media clip.

If we look at this from the point of view of the receiver, a change in the content ID lets us tell whether an NPT discontinuity is simply an edit in the original stream (because the content ID will be the same on both sides of the discontinuity) or whether it is actually a 'new' NPT timecode that is nested within another NPT timecode, or a boundary between two different pieces of media (e.g. different shows). Combined with the NPT endpoint descriptors, the receiver can use this informaiton to know whether a new piece of content is nested iside the original content, or whether it represents a new event.

Normal play time and DSM-CC

The time reference in a DSM-CC stream event is a reference to an NPT time value. If that time is skipped in an NPT discontinuity (i.e. that part of the content has been edited out), then the event triggers at the first NPT reference past the trigger time. An NPT signal is only required to be present in those cases where a stream event needs it - if all the stream events in a piece of media are of the 'do it now' kind (where the event is triggered immediately rather than waiting for a specific NPT value - see the next section), then an NPT signal is not necessary.

DSM-CC stream events and the object carousel

Understanding stream events properly can take a little time; partly because of the way they are used, and partly because of the terrible documentation that exists for them. Hopefully, this section will help you with one of those two, if not the other.

From a DSM-CC perspective, stream events are split into two parts: stream event objects and stream event descriptors. DSM-CC stream event objects are stored in an object carousel, and are just like any other DSM-CC objects.  A stream event object provides a general description of a stream event, which includes an event ID (which must be unique within that carousel) and a human-readable name. This allows a receiver to know what events can get generated, and helps it to check that an application is registering itself as a listener for events that actually exist.

The stream event descriptor is the second part of the puzzle. This is embedded in the MPEG stream as a marker, and tells the receiver that an event has actually been generated. A stream event descriptor contains three main attributes: the ID of the event, an NPT value at which the event should be generated and some application-specific data. The ID allows the receiver to work out which stream event object is associated with this descriptor. Since a broadcaster can't be sure exactly where a descriptor is inserted into an MPEG stream, each descriptor carries an NPT value which says when the event should be triggered. This allows the receiver to know in advance that it should generate an event when a specific NPT value is reached, giving a little more predictability. It also adds a little more reliability into the system, since the broadcaster can send several stream event descriptors with the same values in order to make sure at least one of them is decoded properly. The MHP specification says that for most stream events, they should be signaled at least once every second for a minimum of five seconds before the time they should trigger.

A stream event descriptor can also tell the system that the event should be triggered immediately - these are known as 'do it now' events. This allows stream events to be inserted in to a live show much more easily. For instance, a sports application may use stream events to detect when a goal has been scored in a soccer match, and which team has scored. We'll see the exact method which is used for this later in this section. Stream event descriptors containing 'do it now' events are only broadcast once per event, and so some care has to be taken by the receiver to make sure that it receives them properly.

Whenever a stream event descriptor is received, the receiver takes the following steps:

  1. It checks to see that an event object with the same event ID is present in the default object carousel. If an event with that event ID is not present, then the descriptor is ignored.
  2. If the encoding of the descriptor shows that the event is a 'do it now' event, then the event is triggered immediately.
  3. If the event is not a 'do it now' event, the receiver checks the NPT value at which the event should be triggered. If an event with the same event ID is already scheduled to be triggered at the same NPT value, or if the NPT value has already passed, then the event descriptor is ignored.
  4. When the NPT value reaches the value specified for a scheduled event, the event is triggered.

One advantage that is offered by this separation of stream event objects and stream event descriptors is that events can be re-used. Several stream event descriptors can contain the same event ID, even if they are triggered at different times and contain different private data. This allows an application to use the event ID to define 'classes' of events. For instance, in our soccer example, events descriptors with one event ID may be used to indicate that a goal has been scored, while a different event ID may signify a foul being committed and a third event ID may signify the end of the match. Thus, an application can start processing an event just by knowing the event ID. In some cases, no other application-specific data is needed.

Using stream events

Now that we've seen how the receiver handles these, let's look at how an application can use them.  For an MHP implementation, DSM-CC stream event objects (the objects in the object carousel) are represented by the org.dvb.dsmcc.DSMCCStreamEvent class. This is a subclass of the org.dvb.dsmcc.DSMCCStream class.

A DSMCCStreamEvent object doesn't just represent one DSM-CC stream event object from the object carousel - instead, it acts as an interface to all of the DSM-CC stream event objects that are currently valid. While this doesn't have a direct relationship to DSM-CC stream event objects in the object carousel, it does fit with the underlying protocol used to transmit the object carousel. For more details of how stream events are described in DSM-CC, see section 4.7 of the DVB Implementation Guidelines for Data Broadcasting (ETSI document number TR 101 202). This document is probably the most readable description of the DSM-CC object carousel format that is available (but to be honest, that's not saying very much - it's still a hard read).

The interface for DSMCCStreamEventlooks like this:

public class DSMCCStreamEvent extends DSMCCStream {

  public java.lang.String[] getEventList()

  public int subscribe(
    java.lang.String eventName, StreamEventListener l);

  public void unsubscribe(
    int eventId, StreamEventListener l);
  public void unsubscribe(
    java.lang.String eventName, StreamEventListener l);


The getEventList() method enables the application to get a list of the names of events that are represented by that StreamEvent object. This is effectively a list of the stream events which are valid at that time.

Once the application has a list, it can choose which events to subscribe to. The subscribe() method allows the application to subscribe to an event. To do this, the application must specify the name of the event it wishes to subscribe to, and the listener that will be notified when those events are received. The listener should implement the org.dvb.dsmcc.StreamEventListener interface.

An application can unsubscribe from an event using the unsubscribe() method. As you can see, there are two versions of this, one taking an event name and one taking the stream event ID. The stream event ID is returned by the subscribe() method, and so applications can use either version interchangeably.

When the receiver triggers a stream event that an application has subscribed to, it creates an org.dvb.dsmcc.StreamEvent object and passes this to the registered event listener.  The StreamEvent object provides the application with all the information it needs about the event that has just been triggered:

public class StreamEvent extends java.util.EventObject {

  byte[] getEventData();

  int getEventId();
  java.lang.String getEventName();
  long getEventNPT();

  java.lang.Object getSource();


The getEventData() method returns the application-specific data that was included in the stream event descriptor. Most of the other methods are obvious, with the possible exception of the getSource() method. In this case, calling getSource() returns the org.dvb.dsmcc.DSMCCStreamEvent object that is associated with this particular event. This gives the application an easy way of finding the appropriate stream event object should the application wish to unsubscribe from this event, or subscribe to other events which may be related.

Although early versions of the MHP specification did allow applications to access DSM-CC NPT as a media time in JMF, this feature was removed because the complexity of implementing it was too high and because applications couldn't rely on a consistent interpretation of what NPT values should (or could) be used. Applications can still get the NPT value for a stream via the DSMCCStream class (which provides an interface to DSM-CC Stream objects in the object carousel). This class allows applications to get the NPT value and the current rate, as well as register a listener for changes in the NPT status such as the presence or absence of an NPT reference, or discontinuities in the NPT value.

This interface does make it possible to use NPT references for synchronizing applications and media, but it's not the best approach. DSM-CC stream events are slightly more flexible than using time-based events, although content developers used to timecode- or timeline-based approaches to synchronization may require a little adjustment to use them effectively. The main advantage of stream events is that even if the media is edited following the development of the application, synchronization is still maintained.