12.3. Video Telephony, Interleaved Packetization Using NAL Unit Aggregation

This scheme allows better error concealment and is used in H.263-based designs using RFC 4629 packetization [11]. It has been implemented, and good results were reported [13].

The Video Coding Layer (VCL) encoder codes the source picture so that all macroblocks (MBs) of one MB line are assigned to one slice. All slices with even MB row addresses are combined into one single-time aggregation packet (STAP), and all slices with odd MB row addresses are combined into another. Those STAPs are transmitted as RTP packets. The establishment of the parameter sets is performed as discussed above.

Note that the use of STAPs is essential here, as the high number of individual slices (18 for a Common Intermediate Format (CIF) picture) would lead to unacceptably high IP/UDP/RTP header overhead (unless the source coding tool flexible macroblock ordering (FMO) is used, which is not assumed in this scenario). Furthermore, some wireless video transmission systems, such as H.324M and the IP-based video telephony specified in 3GPP, are likely to use relatively small transport packet size. For example, a typical MTU size of H.223 AL3 SDU is around 100 bytes [17]. Coding individual slices according to this packetization scheme provides further advantage in communication between wired and wireless networks, as individual slices are likely to be smaller than the preferred maximum packet size of wireless systems. Consequently, a gateway can convert the STAPs used in a wired network into several RTP packets with only one NAL unit, which are preferred in a wireless network, and vice versa.

12.3. Video Telephony, Interleaved Packetization Using NAL Unit Aggregation​

12.3. Video Telephony, Interleaved Packetization Using NAL Unit Aggregation