Spatial perceptual information - Compression and Subjective Quality Assessment of 3D Video

Different contents are subject to different spatial complexities. The ITU-T Rec-ommendation P.910 [114] proposes the metric Spatial Information (SI) to measure the spatial perceptual detail of a picture (2.1) . The value of this metric usually increases for more spatially complex scenes. Based on this recommendation and utilization of the Sobel filter (2.2), SI along the vertical or horizontal direction can be measured separately. SI includes the quantity and the strength of the edges in different directions.

2.2. Spatial perceptual information 11

SI =max_time{std_space[Sobel(F_n)]} (2.1)

HSobel =





−1 0 1

−2 0 2

−1 0 1



 (2.2)

The functional model of the binocular vision is shown in Figure 2.3. When the eye is relaxed and the interior lens is the least rounded, the lens has its maximum focal length for distant viewing. As the muscle tension around the ring of muscle is increased and the supporting fibers are thereby loosened, the interior lens rounds out to its minimum focal length. This enables the eye to focus on objects at various distances. This process is known as accommodation [158], and the refractive power is measured in diopters. Accommodation can be defined as the alteration of the lens to focus the area of interest on the fovea, a process that is primarily driven by blur [148,150]. Vergence deals with obtaining and maintaining a single binocular vi-sion by moving both eyes, mainly in opposite directions. Naturally, accommodation and vergence systems are reflexively linked [21,108,123,127]. The amount of accom-modation required to focus on an object, changes proportionally with the amount of vergence needed to fixate that same object in the center of the eyes. The cornea provides two third of the refractive power of the eye and the rest is provided by the

Figure 2.3: Functional model of binocular vision

lens. However, our eye tends to change the curvature of the lens rather than that of the cornea. Normally, when our ciliary muscles are relaxed, parallel rays form distant objects will converge onto the retina. If our eye is maintained at the above state, and a near object is put before it, light rays will converge behind the retina.

As the sharp image is behind the retina, our brain can only detect a blurry image.

To bring the image into focus, the eye performs accommodation. In cases where the optical system is unable to provide a sharp projected image, the blurring artifact is modeled as a low-pass filter characterized by a point spread function (PSF) [179].

When focusing near an object, the ciliary muscle contracts, and suspends the eye.

As a result, surfaces of the cornea and the lens become more curved and thus the eye focuses on the nearby object. When two different perspectives of the scene are available in retinas of each eye, we call this binocular disparity [62]. The HVS utilizes binocular disparity to deduce information about the relative depth between different objects. The capability of the HVS to calculate depth for different objects of each scene is known asstereovision. For a certain amount of accommodation and vergence, there is a small range of distances at which an object is perfectly focused and a deviation in either direction gradually introduces blur. An area defining an absolute limit for disparities that can be fused in the HVS is known as Panum’s fu-sional area [32,112]. It describes an area, within which different points projected on the left and right retina produce binocular fusion and sensation of depth. Panum’s fusional areas are basically elliptical having their long axes located in horizontal direction [91]. This is depicted in Figure 2.4.

The limits of Panum’s fusional area are not constant over the retina, but expand

Figure 2.4: Panum’s fusional areas

2.2. Spatial perceptual information 13 while increasing the eccentricity from the fovea. The limit of fusion in the fovea is equal to the maximum disparity of only one-tenth of a degree, whereas at an eccentricity of 6 degrees, the maximum value is limited to one-third of a degree [61, 173] and at 12 degrees of eccentricity without eye movements the maximum disparity is about two-third of a degree [104].

Considering the amount of light entering the eye and the sensitivity adaptation of the retina, our eye is able to work over a wide range of intensities between 10⁻⁶ and 10¹⁸ cd/m2. The fact that the eye is sensitive to a luminance change (i.e. contrast) rather than the absolute luminance is known as light adaptation and is modeled by a local contrast normalization [171]. The light projected onto the fovea that comes from the visual fixation point and has the highest spatial resolution is called foveal vision. The resolution of the surrounding vision to the foveal vision decreases rapidly and is known as the peripheral vision. Usually a non-regular grid is used to resample the image in a process known asfoveation[73]. Due to different algorithms with which the visual information is processed, the HVS has a different sensitivity to patterns with different densities. The minimum contrast that can reveal a change in the intensity is called a threshold contrast and depends on the pattern density with a contrast sensitivity function (CSF) [167,179]. The neurons in the visual cortex are sensitive to particular combinations of the spatial and temporal frequencies, spatial orientation, and directions of motion. This is well-approximated by two dimensional Gabor functions [167,179]. To perceptually optimize the compression of images, the spatially dependent CSF is used [2].

The LGN receives information directly from the ascending retinal ganglion cells via the optic tract and from the reticular activating system. Both the LGN in the right hemisphere and the LGN in the left hemisphere receive input from each eye.

However, each LGN only receives information from one half of the visual field, as illustrated in Figure 2.3. This occurs due to axons of the ganglion cells from the inner halves of the retina (the nasal sides) decussating (crossing to the other side of the brain) through the optic chiasm. The axons of the ganglion cells from the outer half of the retina (the temporal sides) remain on the same side of the brain.

Therefore, the right hemisphere receives visual information from the left visual field, and the left hemisphere receives visual information from the right visual field. This information is further processed inside LGN.

The number of visual nerves going out of the LGN is about 1% of the neurons entering LGN. This suggests that in LGN a huge de-correlation of the visual infor-mation is performed including binocular masking and extraction of binocular depth cues. LGN fuses two input views to one output view called cyclopean image rep-resenting the scene from a point between the eyes. This image is then carried by the LGN axons fanning out through the deep white matter of the brain as the optic radiations, which will ultimately travel to the primary visual cortex (V1), located at the back of the brain. The binocular suppression theory and also anatomical evidence suggest that a small part of the visual information received in each eye might be delivered to V1 without being processed in LGN.

In document Compression and Subjective Quality Assessment of 3D Video (sivua 27-31)