• Ei tuloksia

Considering that several encoding approaches and asymmetric stereoscopic schemes were introduced and/or evaluated subjectively in this thesis, a summary of the main conclusions is provided next.

[P6] presents a new MR scheme based on the amount of the SI available in the left and right views of each stereoscopic video. In this scheme, one view is downsampled in the horizontal direction while the other view is downsampled in vertical direction.

In this study, each view is evaluated separately and the amount of SI [114] along each direction (vertical and horizontal) is calculated. Comparing these values a

5.6. Summary 51

(a) Undo Dancer

(b) Dog

(c) Newspaper

(d) Pantomime

Figure 5.7: Correlation between subjective scores and objective estimates

decision is made for the downsampling direction of each view keeping the maximum accumulated amount of information for the left and right views. Subjective test ratings compare the proposed cross-asymmetric MR scheme with conventional MR (where one view has FR and the other view is downsampled in both directions) and symmetric FR schemes. The results confirm that the proposed method outperforms other schemes and hence, can be considered as a potential MR scheme as it also decreases the number of pixels involved in the encoding and decoding process.

A new asymmetric stereoscopic video coding method is presented in [P9]. This algorithm benefits from two steps: 1) sample domain quantization which in this study is a linear luma value quantization with rounding, and 2) spatial resolution reduction. The quality of the proposed technique was compared subjectively with two other coding techniques: FR symmetric and MR stereoscopic video coding. In most cases (six out of eight) the proposed method achieved a higher mean value for subjective rating compared to the other schemes.

Considering that it has always been required to estimate the subjective quality of videos by an objective metric, in this thesis we considered the results from two sets of experiments presented in [P3] and [P10] and tried to model the subjective ratings with an objective estimate. In this analysis, three downsampling ratios 12, 38, and 14 were used to create the lower resolution view in the asymmetric stereoscopic content. PPD values were calculated and used in the estimation process as they differ for different resolutions. A logarithmic relation was introduced in [P10] to estimate the subjective rating as a function of PPD of a lower resolution view for different sequences and different test setups. The estimated values and actual ratings resulted in high Pearson correlation coefficients, showing that this metric estimats well the subjective ratings under both test conditions and for all test sequences.

Chapter 6

Depth-Enhanced Multiview Video Compression

6.1 Introduction

The current state of the art multiview video coding standard, MVC [29], is the extension of the H.264/AVC [117]. H.264/AVC and MVC reference softwares [5]

were used in some simulations carried out in this thesis. However, conventional frame-compatible stereoscopic video coding techniques, such as the MVC, enable less flexible 3DV displaying at the receiving or playback devices when compared to depth-enhanced MVC. While two texture views, as in the stereoscopic presentation of 3D content, provide a basic 3D perception, it has been discovered that disparity adjustment between views is required for adapting the content to different viewing conditions and also different display types. Moreover, based on personal preferences, it might be desired to have different disparities on the display [138]. Furthermore, ASD technology, as discussed in section 3.4, typically requires the availability of many high-quality views at the decoder/display side prior to displaying. Due to the natural limitations of content production and broadcasting technologies, there is no way that a large number of views can be delivered to the user with the existing video compression standards. In the majority of cases these views are to be rendered in the playback device from the received views. Such needs can be served by coding 3DV data in the MVD format [90, 141] and exploiting the decoded MVD data as input for DIBR [46, 86]. In MVD format, each texture view is accompanied by a respective depth view presenting pixel based associated depth, from which new views can be synthesized using any DIBR algorithm. The encoding process and displaying of depth-enhanced multiview video is presented in Figure 6.1. In the case of stereoscopic presentation, the desired views for the selected disparity and hence the depth perception will be chosen from the decoded and synthesized views at the display side. Moreover, considering the ASD presentation, based on the required number of views, a subset of the total decoded and synthesized views will be utilized.

53

Figure 6.1: Encoding and synthesis process for a depth-enhanced multiview video

Considering that MVC was not targeting depth-enhanced multiview format, it is not optimized to encode both texture and depth maps. As a result, there have been standardization efforts towards depth-enhanced video coding and MPEG issued a Call for Proposals (CfP) for 3DV coding technology in March 2011 [3]. The target of this CfP was to satisfy the following two ideas: (1) enabling a variety of 3D applications and display types including a varying baseline to adjust the depth perception, (2) supporting multiview ASDs.

Two projects covered by CfP are described in the following paragraphs.

The CfP invited submissions in two categories, the first is compatible with H.264/AVC and the second is compatible with the High Efficiency Video Cod-ing (HEVC) [146] standard. A depth-enhanced extension for MVC, abbreviated MVC+D, specifies the encapsulation of MVC-coded texture and depth views into a single bitstream [30, 149]. The utilized coding technology is identical to MVC, and hence MVC+D is backward-compatible with MVC and the texture views of MVC+D bitstreams can be decoded with an MVC decoder. The MVC+D specification was finalized technically in January 2013. The reference software [5] implementation of MVC+D has been used in several simulations in this thesis [P1], [P4], and [P5].

Joint Collaborative Team on 3D Video (JCT-3V) is an organization targeting ongoing video coding development extension of H.264/AVC, referred here to as 3D-AVC. This development exploits redundancies between texture and depth and in-cludes several coding tools that provide a compression improvement over MVC+D.

The specification requires that the base texture view is compatible with H.264/AVC and compatibility of dependent texture views to MVC may optionally be provided.

3D-AVC is planned to be finalized technically in November 2013. The reference software implementation of H.264/AVC has been used in few publications in this thesis [P7] and [P8].