Discover the new easier way to develop Kurento video applications

H.264 video codec

NAL Units (NALU)

Sources:

An H.264 video is organized into Network Abstraction Layer Units (“NAL units” or “NALU”) that help transporting it with optimal performance depending on whether the transport is stream-oriented or packet-oriented:

  • For stream-oriented transports: the Byte-Stream Format. The NAL units are delivered as a continuous stream of bytes, which means that some boundary mechanism will be needed for the receiver to detect when one unit ends and the next one starts. This is done with the typical method of adding a unique byte pattern before each unit: The receiver will scan the incoming stream, searching for this 3-byte start code prefix, and finding one will mark the beginning of a new unit.
  • For packet-oriented transports: the Packet-Transport Format. This is the preferred method for the Advanced Video Coding (AVC) standard, and it builds upon the fact that all data is carried in packets that are already framed by the system transport protocol (such as RTP/UDP), so start code prefix patterns are not needed for identification of the boundaries of NAL units.

The NAL units can contain either actual data video (VCL units) or codec metadata (non-VCL units). Some of the most important types of NAL units are:

  • Sequence Parameter Set (SPS): This non-VCL NALU contains information required to configure the decoder such as profile, level, resolution, frame rate.
  • Picture Parameter Set (PPS): Similar to the SPS, this non-VCL NALU contains information on entropy coding mode, slice groups, motion prediction, quantization parameters (QP), and deblocking filters.
  • Instantaneous Decoder Refresh (IDR): This VCL NALU is a self contained image slice. That is, an IDR can be decoded and displayed without referencing any other NALU save SPS and PPS.
  • Access Unit Delimiter (AUD): An AUD is an optional NALU that can be use to delimit frames in an elementary stream. It is not required (unless otherwise stated by the container/protocol, like TS), and is often not included in order to save space, but it can be useful to finds the start of a frame without having to fully parse each NALU.

SPS, PPS

A large number of NAL units are combined to form a single video frame; the metadata of such frame would be transmitted in a Picture Parameter Set (PPS). Likewise, a set of PPS would form an actual video sequence, and the metadata for it would be transmitted in a Sequence Parameter Set (SPS). Both PPS and SPS can be sent well ahead of the actual units that will refer to them; then, each individual unit will just contain an index pointing to the corresponding parameter set, so the receiver is able to successfully decode the video.

Details about the exact contents of PPS and SPS packets can be found in the specification document for ISO/IEC 14496-10 (also known as H.264 or MPEG-4 Part 10, Advanced Video Coding (MPEG-4 AVC)), sections “Sequence parameter set data syntax” and “Picture parameter set RBSP syntax”.

Parameter sets can be sent in-band with the actual video, or sent out-of-band via some channel which might be more reliable than the transport of the video itself. This second option makes sense for transports where it is possible that some corruption or information loss might happen; losing a PPS could prevent decoding of a whole frame, and losing an SPS would be worse as it could render a whole chunk of the video impossible to decode, so it is important to transmit these parameter sets via the most reliable channel.

GStreamer caps

Whenever using H.264 as the video codec in Kurento, we’ll see log messages such as this one:

caps: video/x-h264, stream-format=(string)avc, alignment=(string)au,
codec_data=(buffer)0142c01fffe1000e6742c01f8c8d40501e900f08846a01000468ce3c80,
level=(string)3.1, profile=(string)constrained-baseline, width=(int)640,
height=(int)480, framerate=(fraction)0/1, interlace-mode=(string)progressive,
chroma-format=(string)4:2:0, bit-depth-luma=(uint)8, bit-depth-chroma=(uint)8,
parsed=(boolean)true

This describes in detail all aspects of the encoded video; some of them are universal properties of any video (such as the width, height and framerate), while others are highly specific to the H.264 encoding.

  • stream-format: Indicates if the H.264 video is stream-oriented (stream-format = byte-stream) or packet-oriented (stream-format = avc). For byte-stream videos the required parameter sets will be sent in-band with the video, but for avc the video metadata is conveyed via an additional caps field named codec_data.
  • codec_data: Only present when the video is packet oriented (stream-format = avc), this value represents an AVCDecoderConfigurationRecord struct.
  • Other information such as level, profile, width, height, framerate, interlace-mode, and the various chroma and luma settings, are just duplicated values that were extracted from the codec_data by an H.264 parser (namely the h264parse GStreamer element). This is also indicated by means of setting the field parsed=true.

GStreamer codec_data

GStreamer passes a codec_data field in its caps when the H.264 video is using the avc stream format. This field is printed in debug logs as a long hexadecimal sequence, but in reality it is an instance of an AVCDecoderConfigurationRecord, defined in the standard ISO/IEC 14496-15 (aka. MPEG-4) as follows:

aligned(8) class AVCDecoderConfigurationRecord {
    unsigned int(8) configurationVersion = 1;
    unsigned int(8) AVCProfileIndication;
    unsigned int(8) profile_compatibility;
    unsigned int(8) AVCLevelIndication;
    bit(6) reserved = ‘111111’b;
    unsigned int(2) lengthSizeMinusOne;
    bit(3) reserved = ‘111’b;
    unsigned int(5) numOfSequenceParameterSets;
    for (i=0; i< numOfSequenceParameterSets; i++) {
        unsigned int(16) sequenceParameterSetLength ;
        bit(8*sequenceParameterSetLength) sequenceParameterSetNALUnit;
    }
    unsigned int(8) numOfPictureParameterSets;
    for (i=0; i< numOfPictureParameterSets; i++) {
        unsigned int(16) pictureParameterSetLength;
        bit(8*pictureParameterSetLength) pictureParameterSetNALUnit;
    }
    if( profile_idc  ==  100  ||  profile_idc  ==  110  ||
        profile_idc  ==  122  ||  profile_idc  ==  144 )
    {
        bit(6) reserved = ‘111111’b;
        unsigned int(2) chroma_format;
        bit(5) reserved = ‘11111’b;
        unsigned int(3) bit_depth_luma_minus8;
        bit(5) reserved = ‘11111’b;
        unsigned int(3) bit_depth_chroma_minus8;
        unsigned int(8) numOfSequenceParameterSetExt;
        for (i=0; i< numOfSequenceParameterSetExt; i++) {
            unsigned int(16) sequenceParameterSetExtLength;
            bit(8*sequenceParameterSetExtLength) sequenceParameterSetExtNALUnit;
        }
    }
}
  • AVCProfileIndication: profile code as defined in ISO/IEC 14496-10. (profile_idc)
  • profile_compatibility: byte which occurs between the profile_idc and level_idc in a sequence parameter set (SPS), as defined in ISO/IEC 14496-10. (constraint_setx_flag)
  • AVCLevelIndication: level code as defined in ISO/IEC 14496-10. (level_idc)
  • lengthSizeMinusOne: length in bytes of the NALUnitLength field in an AVC video sample or AVC parameter set sample of the associated stream minus one. For example, a size of one byte is indicated with a value of 0. The value of this field shall be one of 0, 1, or 3 corresponding to a length encoded with 1, 2, or 4 bytes, respectively.
  • numOfSequenceParameterSets: number of SPSs that are used as the initial set of SPSs for decoding the AVC elementary stream.
  • sequenceParameterSetLength: length in bytes of the SPS NAL unit as defined in ISO/IEC 14496-10.
  • sequenceParameterSetNALUnit: a SPS NAL unit, as specified in ISO/IEC 14496-10. SPSs shall occur in order of ascending parameter set identifier with gaps being allowed.
  • numOfPictureParameterSets: number of picture parameter sets (PPSs) that are used as the initial set of PPSs for decoding the AVC elementary stream.
  • pictureParameterSetLength: length in bytes of the PPS NAL unit as defined in ISO/IEC 14496-10.
  • pictureParameterSetNALUnit: a PPS NAL unit, as specified in ISO/IEC 14496-10. PPSs shall occur in order of ascending parameter set identifier with gaps being allowed.
  • chroma_format: chroma_format indicator as defined by the chroma_format_idc parameter in ISO/IEC 14496-10.
  • bit_depth_luma_minus8: bit depth of the samples in the Luma arrays. For example, a bit depth of 8 is indicated with a value of zero (bit depth = 8 + bit_depth_luma_minus8). The value of this field shall be in the range of 0 to 4, inclusive.
  • bit_depth_chroma_minus8: bit depth of the samples in the Chroma arrays. For example, a bit depth of 8 is indicated with a value of zero (bit depth = 8 + bit_depth_chroma_minus8). The value of this field shall be in the range of 0 to 4, inclusive.
  • numOfSequenceParameterSetExt: number of Sequence Parameter Set Extensions that are used for decoding the AVC elementary stream.
  • sequenceParameterSetExtLength: length in bytes of the SPS Extension NAL unit as defined in ISO/IEC 14496-10.
  • sequenceParameterSetExtNALUnit: a SPS Extension NAL unit, as specified in ISO/IEC 14496-10.

Example mapping

Let’s “translate” a sample codec_data into its components, to show the meaning of each field:

codec_data=(buffer)0142c01fffe1000e6742c01f8c8d40501e900f08846a01000468ce3c80

This would map to an AVCDecoderConfigurationRecord struct as follows:

0142c01fffe1000e6742c01f8c8d40501e900f08846a01000468ce3c80
01                                                         -> configurationVersion = 0x01 = 1
  42                                                       -> AVCProfileIndication = 0x42 = 66
    c0                                                     -> profile_compatibility = 0xC0
      1f                                                   -> AVCLevelIndication = 0x1F = 31
        ff                                                 -> lengthSizeMinusOne = 0b11 = 3
          e1                                               -> numOfSequenceParameterSets = 0b00001 = 1
            000e                                           -> sequenceParameterSetLength = 0x000E = 14
                6742c01f8c8d40501e900f08846a               -> 1x14 bytes sequenceParameterSetNALUnit
                                            01             -> numOfPictureParameterSets = 0x01 = 1
                                              0004         -> pictureParameterSetLength = 0x0004 = 4
                                                  68ce3c80 -> 1x4 bytes pictureParameterSetNALUnit

This is the mapping for the first bytes of the Sequence Parameter Set NAL Unit:

6742c01f8c8d40501e900f08846a
67                           -> Header = 0x67 = 0b0110_0111
                                forbidden_zero_bit = 0b0 = 0
                                nal_ref_idc = 0b11 = 3
                                nal_unit_type = 0b00111 = 7
  42                         -> profile_idc = 0x42 = 66
    c0                       -> constraint_setx_flag = 0xC0 = 0b1100_0000
                                constraint_set0_flag = 1
                                constraint_set1_flag = 1
      1f                     -> level_idc = 0x1F = 31
        ...                  -> Chroma, luma, scaling and more information

Note how the fields profile_idc, constraint_setx_flag, and level_idc get duplicated outside of this structure, in the codec_data’s AVCProfileIndication, profile_compatibility, and AVCLevelIndication, respectively.

In this example case, according to the definitions from ISO/IEC 14496-10 (Annex A.2.1), a profile_idc of 66 with constraint_set0_flag and constraint_set1_flag = 1 corresponds to the H.264 Constrained Baseline profile; and level_idc = 31 which means Level 3.1.