H.264 video codec
This page is all about H.264, also known as Advanced Video Coding (AVC), MPEG-4 Part 10, or MPEG-4 AVC.
Table of Contents
Sources:
Profiles and Levels
Profiles and Levels specify restrictions on bitstreams and hence limits on the capabilities needed to decode them. They may also be used to indicate interoperability points between individual decoder implementations. These parameters are defined in the Annex A of the H.264 Specification:
Each Profile specifies a subset of features that shall be supported by all decoders conforming to that Profile.
Each Level specifies a set of limits on the values that may be taken by the parameters of the bitstream.
The same set of Level definitions is used with all Profiles, but individual implementations may support a different Level for each supported Profile. For any given Profile, Levels generally correspond to decoder processing load and memory capability.
While common knowledge states that bandwidth usage decreases as H.264 moves from Baseline to Main to High Profile, unfortunately this is simply not true. Users seeking to reduce bandwidth usage should test their sources (such as cameras) in place with multiple Profiles to determine which is best for their application, as manufacturer implementation and data savings vary widely. For example, switching to High Profile in some cameras may increase bitrate, making network congestion issues worse.
The Profile and Level of an H.264 stream is usually given by a 3-byte hexadecimal value called Sequence Parameter Set (SPS):
profile_idc
profile-iop
constraint_set0_flag: Full compliance with Baseline Profile
constraint_set1_flag: Full compliance with Main Profile
constraint_set2_flag: Full compliance with Extended Profile
constraint_set3_flag
constraint_set4_flag
constraint_set5_flag
reserved_zero_2bits (2 bits, always 0)
level_idc
The profile-iop is a set of binary flags that change the meaning of the other two bytes. Their meaning are defined together with the different Profiles, and also in the H.264 Specification: Section 7.4.2.1.1 Sequence parameter set data semantics.
Profiles
These are the first five, most commonly used Profiles:
Profile name |
Parameters |
---|---|
Baseline |
profile_idc = 66 = 0x42, constraint_set1_flag = 0 |
Constrained Baseline |
profile_idc = 66 = 0x42, constraint_set1_flag = 1 |
Main |
profile_idc = 77 = 0x4D |
Extended |
profile_idc = 88 = 0x58 |
High |
profile_idc = 100 = 0x64 |
There are more Profiles which specify higher capabilities, such as High 10, High 4:2:2, High 4:4:4 or CAVLC 4:4:4. These are properly defined in the H.264 Specification.
Levels
The Baseline, Constrained Baseline, Main, and Extended Profiles share a set of Levels. These specify some numeric parameters related to the decoding bitrates, timings, motion vectors, and other algorithmic constrains that define the capabilities required by the decoder. The Specification contains a table where multiple Levels are defined: 1, 1b, 1.1, 1.2, 1.3, 2, 2.1, …, up to 6.2.
Profiles which are higher than the ones mentioned also have their own defined set of shared Levels.
NAL Units (NALU)
Sources:
An H.264 video is organized into Network Abstraction Layer Units (“NAL units” or “NALU”) that help transporting it with optimal performance depending on whether the transport is stream-oriented or packet-oriented:
For stream-oriented transports: the Byte-Stream Format. The NAL units are delivered as a continuous stream of bytes, which means that some boundary mechanism will be needed for the receiver to detect when one unit ends and the next one starts. This is done with the typical method of adding a unique byte pattern before each unit: The receiver will scan the incoming stream, searching for this 3-byte start code prefix, and finding one will mark the beginning of a new unit.
For packet-oriented transports: the Packet-Transport Format. This is the preferred method for the Advanced Video Coding (AVC) standard, and it builds upon the fact that all data is carried in packets that are already framed by the system transport protocol (such as RTP/UDP), so start code prefix patterns are not needed for identification of the boundaries of NAL units.
The NAL units can contain either actual data video (VCL units) or codec metadata (non-VCL units). Some of the most important types of NAL units are:
Sequence Parameter Set (SPS): This non-VCL NALU contains information required to configure the decoder such as profile, level, resolution, frame rate.
Picture Parameter Set (PPS): Similar to the SPS, this non-VCL NALU contains information on entropy coding mode, slice groups, motion prediction, quantization parameters (QP), and deblocking filters.
Instantaneous Decoder Refresh (IDR): This VCL NALU is a self contained image slice. That is, an IDR can be decoded and displayed without referencing any other NALU save SPS and PPS.
Access Unit Delimiter (AUD): An AUD is an optional NALU that can be use to delimit frames in an elementary stream. It is not required (unless otherwise stated by the container/protocol, like TS), and is often not included in order to save space, but it can be useful to finds the start of a frame without having to fully parse each NALU.
SPS, PPS
A large number of NAL units are combined to form a single video frame; the metadata of such frame would be transmitted in a Picture Parameter Set (PPS). Likewise, a set of PPS would form an actual video sequence, and the metadata for it would be transmitted in a Sequence Parameter Set (SPS). Both PPS and SPS can be sent well ahead of the actual units that will refer to them; then, each individual unit will just contain an index pointing to the corresponding parameter set, so the receiver is able to successfully decode the video.
Details about the exact contents of PPS and SPS packets can be found in the H.264 Specification sections “Sequence parameter set data syntax” and “Picture parameter set RBSP syntax”.
Parameter sets can be sent in-band with the actual video, or sent out-of-band via some channel which might be more reliable than the transport of the video itself. This second option makes sense for transports where it is possible that some corruption or information loss might happen; losing a PPS could prevent decoding of a whole frame, and losing an SPS would be worse as it could render a whole chunk of the video impossible to decode, so it is important to transmit these parameter sets via the most reliable channel.
GStreamer caps
Whenever using H.264 as the video codec in Kurento, we’ll see log messages such as this one:
caps: video/x-h264, stream-format=(string)avc, alignment=(string)au,
codec_data=(buffer)0142c01fffe1000e6742c01f8c8d40501e900f08846a01000468ce3c80,
level=(string)3.1, profile=(string)constrained-baseline, width=(int)640,
height=(int)480, framerate=(fraction)0/1, interlace-mode=(string)progressive,
chroma-format=(string)4:2:0, bit-depth-luma=(uint)8, bit-depth-chroma=(uint)8,
parsed=(boolean)true
This describes in detail all aspects of the encoded video; some of them are universal properties of any video (such as the width, height and framerate), while others are highly specific to the H.264 encoding.
stream-format: Indicates if the H.264 video is stream-oriented (stream-format = byte-stream) or packet-oriented (stream-format = avc). For byte-stream videos the required parameter sets will be sent in-band with the video, but for avc the video metadata is conveyed via an additional caps field named codec_data.
codec_data: Only present when the video is packet oriented (stream-format = avc), this value represents an AVCDecoderConfigurationRecord struct.
Other information such as level, profile, width, height, framerate, interlace-mode, and the various chroma and luma settings, are just duplicated values that were extracted from the codec_data by an H.264 parser (namely the h264parse GStreamer element). This is also indicated by means of setting the field parsed=true.
GStreamer “codec_data”
GStreamer passes a codec_data field in its caps when the H.264 video is using the avc stream format. This field is printed in debug logs as a long hexadecimal sequence, but in reality it is an instance of an AVCDecoderConfigurationRecord, defined in the standard ISO/IEC 14496-15 (aka. MPEG-4) as follows:
aligned(8) class AVCDecoderConfigurationRecord {
unsigned int(8) configurationVersion = 1;
unsigned int(8) AVCProfileIndication;
unsigned int(8) profile_compatibility;
unsigned int(8) AVCLevelIndication;
bit(6) reserved = ‘111111'b;
unsigned int(2) lengthSizeMinusOne;
bit(3) reserved = ‘111'b;
unsigned int(5) numOfSequenceParameterSets;
for (i=0; i< numOfSequenceParameterSets; i++) {
unsigned int(16) sequenceParameterSetLength ;
bit(8*sequenceParameterSetLength) sequenceParameterSetNALUnit;
}
unsigned int(8) numOfPictureParameterSets;
for (i=0; i< numOfPictureParameterSets; i++) {
unsigned int(16) pictureParameterSetLength;
bit(8*pictureParameterSetLength) pictureParameterSetNALUnit;
}
if( profile_idc == 100 || profile_idc == 110 ||
profile_idc == 122 || profile_idc == 144 )
{
bit(6) reserved = ‘111111'b;
unsigned int(2) chroma_format;
bit(5) reserved = ‘11111'b;
unsigned int(3) bit_depth_luma_minus8;
bit(5) reserved = ‘11111'b;
unsigned int(3) bit_depth_chroma_minus8;
unsigned int(8) numOfSequenceParameterSetExt;
for (i=0; i< numOfSequenceParameterSetExt; i++) {
unsigned int(16) sequenceParameterSetExtLength;
bit(8*sequenceParameterSetExtLength) sequenceParameterSetExtNALUnit;
}
}
}
AVCProfileIndication: profile code as defined in H.264 Specification (profile_idc).
profile_compatibility: byte which occurs between the profile_idc and level_idc in a sequence parameter set (SPS), as defined in H.264 Specification. (constraint_setx_flag)
AVCLevelIndication: level code as defined in H.264 Specification (level_idc).
lengthSizeMinusOne: length in bytes of the NALUnitLength field in an AVC video sample or AVC parameter set sample of the associated stream minus one. For example, a size of one byte is indicated with a value of 0. The value of this field shall be one of 0, 1, or 3 corresponding to a length encoded with 1, 2, or 4 bytes, respectively.
numOfSequenceParameterSets: number of SPSs that are used as the initial set of SPSs for decoding the AVC elementary stream.
sequenceParameterSetLength: length in bytes of the SPS NAL unit as defined in H.264 Specification.
sequenceParameterSetNALUnit: a SPS NAL unit, as specified in H.264 Specification. SPSs shall occur in order of ascending parameter set identifier with gaps being allowed.
numOfPictureParameterSets: number of picture parameter sets (PPSs) that are used as the initial set of PPSs for decoding the AVC elementary stream.
pictureParameterSetLength: length in bytes of the PPS NAL unit as defined in H.264 Specification.
pictureParameterSetNALUnit: a PPS NAL unit, as specified in H.264 Specification. PPSs shall occur in order of ascending parameter set identifier with gaps being allowed.
chroma_format: chroma_format indicator as defined by the chroma_format_idc parameter in H.264 Specification.
bit_depth_luma_minus8: bit depth of the samples in the Luma arrays. For example, a bit depth of 8 is indicated with a value of zero (bit depth = 8 + bit_depth_luma_minus8). The value of this field shall be in the range of 0 to 4, inclusive.
bit_depth_chroma_minus8: bit depth of the samples in the Chroma arrays. For example, a bit depth of 8 is indicated with a value of zero (bit depth = 8 + bit_depth_chroma_minus8). The value of this field shall be in the range of 0 to 4, inclusive.
numOfSequenceParameterSetExt: number of Sequence Parameter Set Extensions that are used for decoding the AVC elementary stream.
sequenceParameterSetExtLength: length in bytes of the SPS Extension NAL unit as defined in H.264 Specification.
sequenceParameterSetExtNALUnit: a SPS Extension NAL unit, as specified in H.264 Specification.
Example mapping
Let’s “translate” a sample codec_data into its components, to show the meaning of each field:
codec_data=(buffer)0142c01fffe1000e6742c01f8c8d40501e900f08846a01000468ce3c80
This would map to an AVCDecoderConfigurationRecord struct as follows:
0142c01fffe1000e6742c01f8c8d40501e900f08846a01000468ce3c80
01 -> configurationVersion = 0x01 = 1
42 -> AVCProfileIndication = 0x42 = 66
c0 -> profile_compatibility = 0xC0
1f -> AVCLevelIndication = 0x1F = 31
ff -> lengthSizeMinusOne = 0b11 = 3
e1 -> numOfSequenceParameterSets = 0b00001 = 1
000e -> sequenceParameterSetLength = 0x000E = 14
6742c01f8c8d40501e900f08846a -> 1x14 bytes sequenceParameterSetNALUnit
01 -> numOfPictureParameterSets = 0x01 = 1
0004 -> pictureParameterSetLength = 0x0004 = 4
68ce3c80 -> 1x4 bytes pictureParameterSetNALUnit
This is the mapping for the first bytes of the Sequence Parameter Set NAL Unit:
6742c01f8c8d40501e900f08846a
67 -> Header = 0x67 = 0b0110_0111
forbidden_zero_bit = 0b0 = 0
nal_ref_idc = 0b11 = 3
nal_unit_type = 0b00111 = 7
42 -> profile_idc = 0x42 = 66
c0 -> constraint_setx_flag = 0xC0 = 0b1100_0000
constraint_set0_flag = 1
constraint_set1_flag = 1
1f -> level_idc = 0x1F = 31
... -> Chroma, luma, scaling and more information
Note how the fields profile_idc, constraint_setx_flag, and level_idc get duplicated outside of this structure, in the codec_data’s AVCProfileIndication, profile_compatibility, and AVCLevelIndication, respectively.
In this example case, according to the definitions from H.264 Specification (Annex A.2.1 Baseline profile), a profile_idc of 66 with constraint_set0_flag and constraint_set1_flag = 1 corresponds to the H.264 Constrained Baseline profile; and level_idc = 31 which means Level 3.1.