RTP Sender - Design and Implementation of a Secure Real-time Transport Protocol Library for Hig

5. UVGRTP

5.5 RTP Sender

In uvgRTP, there is no single entity called sender that would handle all the sending-related stuff but rather the idea of a sender is constructed from multiple different actors that work together to provide sending capabilities for uvgRTP. Sending an RTP frame using uvgRTP happens by calling push_frame() which fragments the input into smaller RTP frames, gathers session statis-tics and encrypts and authenticates the RTP frame. Table 11 shows the flags that can be given to push_frame() that will alter its default behavior.

Name Explanation

RTP_SLICE Media types that have slice support (such as HEVC) should pass this flag to push_frame() when input frame is a slice because they are handled differently than normal H26X NAL units

RTP_COPY Tells uvgRTP that it should make a copy of the input frame. Used, for example, if the input frame is read-only memory.

Table 11: Supported send flags

The uvgRTP’s sender architecture is comprised of three separate handlers, the media-specific handler, the RTCP handler and the SRTP handler, a frame queue that is used to implement SCC, and a socket object that provides a platform-generic socket API for uvgRTP. Following sections give an in-depth look into the uvgRTP sender architecture.

5.5.1 Generic Send Process

Most of the uvgRTP’s send stack is generic for all media formats. This means that after the media-specific operations are done on the input, the process becomes the same, regardless of what the RTP frame is carrying in its payload. Figure 43 shows the send process as a flow chart.

Figure 43: Flowchart of the send process

The send process starts when the user calls push_frame() with valid parameters. The top-level push_frame() that belongs to the media stream object validates the input parameters and checks if the user provided RTP_COPY in the flag parameters. If so, a copy of the input is created.

After that, either the copy or the original block is passed on to the media-specific handler. How this handler works, for example for HEVC, is discussed in Section 5.5.2. The media-specific han-dler is responsible for fragmenting and packetizing the input frame into RTP frames whose size do not exceed the MTU size. User can control what the MTU size by using RCC_MTU_SIZE.

As the media-specific handler packetizes the media, it passes the ready RTP frames onto the frame queue which is a platform-generic implementation of SCC. But before the packets are sent off to the network, the RTCP handler gathers information from the packets that are used to update

the sender’s own statistics which are used to populate some of the fields in RTCP Sender Re-ports. After RTCP handler has finished gathering the session statistics, and if SRTP has been enabled, the packet is given to the SRTP handler. This handler encrypts the payloads of all RTP frames and adds an authentication tag to each if enabled. Finally, the packets are sent to the remote participant by utilizing SCC.

As the SRTP and RTCP handlers operate on SCC buffers, the media-specific handler does not need to concern itself with any of that and can only implement what the specification requires.

This allows flexibility and makes supporting all kinds of RTP payload formats very easy as they only have to operate on a chunk of memory and split it into smaller chunks according to the rules defined in the specification.

5.5.2 HEVC Send Process

Figure 44 shows the contents of the Media-specific processing box shown in Figure 43 for an HEVC stream.

Figure 44: Flowchart of the HEVC send process

When the push_frame() is called, it first reaches an intermediate class called h26x which imple-ments the generic frame processing for AVC, HEVC, and VVC. The responsibility of this class is to perform the optimized SCL on the input frame. It scans through the data using Algorithm 6 and for each found NAL unit, it calls the codec-specific push_nal_unit() function with the NAL data and its size. The codec-specific push_nal_unit() then implements the packetization specified in its RFC.

As introduced in Section 3.5, RFC 7798 specifies three packet types: a single NAL unit packet, an aggregation packet, and a fragmentation unit. The single NAL unit and aggregation packets are used to send small NAL units such as parameter sets. The decision whether to use a single NAL or an aggregation packet is determined by how many small NAL units each HEVC frame has. If only one, a single NAL unit packet is used. If more than one, an aggregation packet is used.

The third type, fragmentation units, are used to fragment large NAL units into more manageable size that can then be transported over the network. As the sender processes through the NAL unit, it creates NAL and FU headers, appends them to a preallocated RTP frame, and for each RTP frame, it gives an MTU-determined number of bytes as payload data. For RTP streams that use 1500-byte MTUs and do not use RTP authentication tags, the payload size is 1443 bytes.

When all the input has been processed and the buffer of RTP frames is ready, the sender signals the frame queue that it can flush to queue, i.e., send the packets over the network. From this description can be seen that the HEVC handler does not need to concern itself with anything else but fragmenting the input into RTP frames. Everything else in the send process is handled by some other entity, adding great flexibility and extensibility to the system.

In document Design and Implementation of a Secure Real-time Transport Protocol Library for High-Speed Video Streaming (sivua 63-66)