John Fremlin's blog: Redundant video bitstream

Posted 2014-10-22 15:47:00 GMT

Video has the most computationally heavy compression in widespread use. At the same time, layer after layer of abstraction with a view to incredible flexibility, where each group dreams of a container format that will carry any codec, leads to an incredible duplication of metadata.

For example, how many times is the video resolution encoded in an MP4 file with a typical MPEG4 AVC payload? The instinctive reaction of each group of architects at each point in the stack is to redefine the description of the video in some slightly inadequate schema that is not quite a superset of the others, so all are needed. For VOD or realtime video, the resolution is eagerly repeated again and again in each extra transport layer. Certainly for a standard MP4 stream, the resolution must be in the H.264 Sequence Parameter Set used by the video codec. But then also in the avcC box - the configuration of that codec in the MPEG4 stream format, and then also in the track header tkhd box. So I count at least three. Are there more?

Why is this bad? The information is unnecessary, in aggregate over the billions of videos made in the world it wastes huge amounts of not only video storage but mental bandwidth in standards texts, and a video file can now be created that is inconsistent so every implementation needs error handling for this case.

This is, I suppose, an example of a sort of bikeshedding: an issue so trivial (the resolution of a video is easily understood) so that each abstraction is eager to take responsibility for it, whereas real practical issues, like the relative placement of the indices inside the file, are left to fall through the cracks.

Post a comment