• Ei tuloksia

Kvazaar HEVC intra encoder

Kvazaar [4] is an open-source HEVC encoder that is being developed from scratch in C by Ultra Video Group in the Department of Pervasive Computing at Tampere University of Technology. Kvazaar has a modular and portable structure that attains high coding eciency with optimized speed and resources. Kvazaar is currently the leading open source intra encoder. Table 4.2 summarizes the basics of Kvazaar. The source codes of Kvazaar [4] are on GitHub, which is a web-based Git repository hosting service.

Table 4.1: Kvazaar HEVC coding parameters used in this work

Feature Kvazaar HEVC intra encoder

Prole Main

Internal bit depth, color format 8, 4:2:0

Coding modes Intra

Sizes of luma coding blocks 64x64, 32x32, 16x16, 8x8 Sizes of luma transform blocks 32x32, 16x16, 8x8, 4x4

Sizes of luma prediction blocks 64x64, 32x32, 16x16, 8x8, 4x4 Intra prediction modes DC, planar, 33 angular

Mode decision metric SAD

RDO 1

RDOQ Disabled

Transform Integer DCT (integer DST for luma 4×4) 4x4 transform skip Enabled

Loop ltering DF, SAO

Table 4.2: Kvazaar digest Main developer Tampere University of Technology Source codes github.com/ultravideo

License GNU LGPLv2.1

Contributors 7 at TUT + 6 external Language C with intrinsics/ASM Operating systems x86, x64, PowerPC, ARM Processors DC, planar, 33 angular

Presets RD1 for high-speed encoding, RD2 for high-quality encoding Table 4.1 shows the coding parameters used with Kvazaar in this thesis. The most important values from the table are that only intra coding is used, all block

4. Kvazaar HEVC intra encoder 15

sizes and prediction modes are supported, Sum of Absolute Dierences (SAD) is used as the mode decision metric, Distortion Optimization (RDO) is 1, Rate-Distortion Optimized Quantization (RDOQ) disabled, and transform skip enabled.

These settings are chosen for reducing the complexity of the encoder.

This thesis does not cover all aspects of the encoder. As the main purpose of this thesis was to accelerate only a part of Kvazaar, a limited knowledge of the whole HEVC encoding process, but deep understanding of the functions to be accelerated, is sucient. First runs with Kvazaar were done on a PC. These runs were done to get familiar with Kvazaar parameters and doing some step-by-step debug runs to better learn the encoding ow of Kvazaar.

When the work on this Thesis started, Kvazaar intra encoder was still greatly under development. It was still missing some encoding tools for intra encoding and was just getting parallelization tools added. The changes in Kvazaar during the process of this Thesis had minimal eects to the work done. Kvazaar was easy to get working on a soft processor synthesized to an FPGA chip on a DE2 board. The software development environment for NiosII was NiosII 12.1 Software Build Tools for Eclipse, which was able to compile the source codes with minimal changes. Changes included removing and replacing unsupported code and writing a new method to read video input from memory as there is no trivial way to read data from an external storage with NiosII. This was the case with the earlier Kvazaar version that did not yet implement threads as a major part of the encoder. Compiling for NiosII with the later versions of Kvazaar would require major changes to the code.

To get an understanding how demanding HEVC and Kvazaar is, the encoding speed with a QCIF resolution (176x144) video (Carphone [29]) was only 0,065 fps, with the NiosII processor running at 50 MHz.

4.1 Proling Kvazaar

After the runs on NiosII, it was very clear that Kvazaar would need a major speed boost to get acceptable results. These improvements could be achieved by creating an HW accelerator for Kvazaar. Choosing what parts or functions to be accelerated can be hard, because not all functions can be accelerated depending on the structure and the functionality. By proling the encoder and getting accurate time usages of all functions helps to narrow the search.

Gprof [30] is a tool for proling programs and it is part of the GCC compiler.

To compile a source le for proling, the only thing needed to do is to specify a -pg ag when the compilation is done. When the gprof compiled application is run it produces a "gmon.out" le that can then be processed with gprof, which in turn outputs tables of processing times for all functions and also the cumulative time for all functions. This table is useful as-is, but from it a visual graph representation

4. Kvazaar HEVC intra encoder 16

Figure 4.1: Kvazaar time usage diagram.

can be generated with gprof2dot [31].

Table 4.3 and Figure 4.1 shows the most time consuming parts of Kvazaar intra encoding. The video sequence (Kristen And Sara [32]) used to get these results had a 1280x720 resolution. From the Table 4.3 it can be seen that the most time consuming function in Kvazaar with quantization value 32 is intra_get_angular_pred. The function takes 39.23% of the overall encoding time when using full-intra search and 22.50% when using rough search.

Rough search implements a coarser version of the full-intra search. First it calcu-lates the SAD for evenly spaced modes to select the starting point for a more rened search around the starting point.

Although the search_intra_rough function only takes 2.17% of the overall en-coding time when using full intra search and 2.75% when using rough search, the cumulative time is much higher for both. For full intra search it is 66.24% and for rough search it is 41.83%. The cumulative time usage of search_intra_rough consists of all the functions marked purple.

The rest of the Thesis will focus on full intra search only, rather than rough search, as it will produce better picture quality and a better insight to accelerating algorithms by using an HLS tool.

4. Kvazaar HEVC intra encoder 17

Table 4.3: Most time consuming functions of Kvazaar in percentages

Full intra search (CPU only) Rough intra search (CPU only)

% Functions % Functions

18