Challenges of camera benchmarking - Benchmarking of mobile phone cameras

In case of a simple benchmarking of a processor or system, the result is based on the performance of the system in executing a certain item of test software. In practice, the time and memory usage the software takes to execute a certain algorithm is defined as the performance of the system. When a camera system is benchmarked, the situation is more complicated. There are numerous different quality and speed metrics available and selection and combination of the metrics can be problematic. Also different environments may change the ranking between cameras.

5.3.1 Which metrics to select

The first challenge is to select the metrics to be used in the benchmarking score.

The CPIQ group defines the fundamental objective measurements as follows:

spatial resolution, tone and color reproduction, sensitivity, noise, and geometric fidelity (CPIQ Phase 2 Introduction 2009).

In case of color accuracy, the selection of the standard is quite straightforward, because the CIEDE metrics are widely acknowledged and used. However, several metrics are still available. Chrominance, hue and luminance differences can be used as well as the generic color difference ∆E. Also the older versions of the CIEDE metrics are still used to some extent to keep the compatibility with old measurements and probably due to the complexity of the latest ∆E00 equations. A tone reproduction could be measured from the color difference values of gray patches of the test chart, but it is not defined as a standardized metric. However, some shortcomings can be faced when the ∆E based measurements are made. Some mobile phone vendors add an extra color tint to the images to get a better perceptual impression. Even if this color change is wanted, it increases measured color error.

The phenomenon was noticed, when the color measurements of the attached articles were made.

Spatial resolution offers many more metrics to choose from. Firstly, the latest standard defines three different methods and test charts to calculate the spatial resolution. Even though slanted edge and Siemens star methods are based on modulation transfer function (MTF) curves, they do not end up with identical results. Secondly, the standard no longer specifies a limiting resolution, which could be referred to, but describe the whole MTF curve as a result of the spatial resolution. This gives a questionable freedom to different tools when specifying the single number result for resolution. Imatest defines MTF50 or MTF50P as a good reference value. Image engineering defines the limiting threshold as 10% of the initial value and DxO defines the limiting threshold as 5% of the initial value.

Finally, Skype video test documentation mentions MTF30 as a good metric.

Moreover, texture resolution can be defined as a part of the spatial resolution, but there are no standardized metrics for those measurements. The latest de-facto standard, which is used in several tools, is so called dead leaves method defined in section 4.2.4, but the noise compensation methods are still investigated and they vary between tools. Also the resolution related artefacts like over sharpening and aliasing do not yet have specific metrics.

The noise standard ISO 15739 has several metrics for selection: total noise separated to temporal and fixed pattern noise, signal to noise ratio and the latest

version has visual noise metrics. The sensitivity of a camera system can be derived from the dynamic range measurement of ISO 15739 documentation or ISO speed measurements of ISO 12232.

To cover all the fundamental measurements of the CPIQ group, the geometric fidelity can be measured according SMIA metric, using height distortion of ISO 9039 or CPIQ Phase 2 proposal for the distortion.

Unfortunately, there are still several metrics left, even though the fundamental objective measurements of CPIQ have already been discussed: Section 4.3 defines lens distortion metrics like chromatic aberration, vignetting, lens shading and glare which can be measured using ISO standards and CPIQ Phase 2 proposals. Camera speed metrics are discussed in section 4.7 and defined in the ISO 15781 standard and they could be a very valuable addition to the benchmarking. Finally, video related metrics in section 4.6 may, at least, double the number of quality metrics.

Clearly, the number of the metrics is so big and the characteristics of the metrics are so different that some selection or weighting has to be done. Roughly, the number of all metrics including those for video would be more than fifty and it is difficult to imagine a single number score which could include and combine such number of values. The selection of the metrics is a tradeoff between the coverage and complexity of the benchmarking.

5.3.2 Metrics of different environments

The selection of the metrics is only one dimension of the benchmarking challenge.

The imaging environment used, the photospace, will affect significantly the quality and performance of the camera systems. Keelan defines a photospace, which has illumination and object distance parameters (Keelan 2002). In case of mobile phone cameras, the most important environment parameter is the illumination. Due to small pixel size and demanding lens requirements, mobile phone cameras are very vulnerable to a low light environment. Moreover, different phone types react to the light changes in various ways as defined in the author’s article (Peltoketo 2015).

To get a comprehensive benchmarking result, camera systems should be tested in several light environments including both illumination and color temperature changes. ANSI organization has defined low light measurements for video recorders (ANSI/CEA-639 2010), but a corresponding still image standard is not yet available, even if there has been a proposal to create a similar metrics for still image cameras (Wueller 2013).

Flash usage will create another use case for low light imaging. The color of the flash light, luminous power, uniformity, and flash synchronization with the image capturing are elements which affects the flash supported low light imaging. The flash may also generate its own artefacts, like red eyes.

Another dimension of the photospace, the distance to the object, will affect the lens distortions artefacts, focus performance, and depth of focus features of the camera.

In particular, near objects are challenging for the mobile phone cameras.

Finally, the movement of the photographed object or the movement of the camera will affect image quality. Camera parameters and features like exposure time, ISO speed, video frame rate, autofocus speed and possible image stabilization will improve or worse the final quality. Säämänen et al. have proposed defining a videospace, similar to the photospace but which also includes movement of the object (Säämänen et al. 2010). Even though the videospace is intended for specifying the quality of a video recording system, it might also be usable for still image testing.

All in all, there are several environmental factors which affect the quality and performance of a camera system and they should be considered part of the benchmarking process.

5.3.3 Perceptual benchmarking

As well as the image quality measurements, the benchmarking result should correlate with the perceptual judgement of a mobile phone camera. This would mean that every metric is either already perceptually adjusted or there is a conversion function which transforms the objective metric into a perceptual one.

The final perceptual benchmarking score should be calculated from the perceptual metrics weighted so that the weight factors between metrics should be also perceptually adjusted.

Even though part of the image quality metrics are already perceptually adjusted, for example color differences and visual noise, this work concentrates mainly on objective benchmarking. A true perceptual benchmarking requires another approach to the measurements and benchmarking score equations. The coming CPIQ standard will publish the first perceptual benchmarking system, where every quality metric has been separately converted into a perceptual one. The future will reveal, if the conversion functions are accurate enough. However, the basic assumption of every objective quality metric is that it correlates, at some level, with perceptual quality. Thus, the objective benchmarking score calculated from

objective metrics should correlate also, at some level, with the perceptual benchmarking.

5.3.4 Several metrics to single score

To get a straightforward and comparable benchmarking score, several systems uses a single number score to express the performance of a device. In case of a camera system several different metrics should be combined into one value, which could be a problematic task.

When the benchmarking includes metrics, which have clearly different effects on the final camera quality, the metrics have to be weighted. For example color accuracy and chromatic aberration cannot be treated as equal sources of a generic benchmarking score. Extreme care should be taken, when the weights are selected and evaluated. Since the final results can be totally manipulated using inappropriate weights, all equations and used metrics should be public.

With or without weighting factors, the individual metrics have to be combined and calculated to get a single number score. Several solutions can be found from literature to solve this issue and arithmetic, harmonic, geometric, and weighted means have been proposed. However, they are related to a system, where the metrics can be normalized or they have the same unit of measure (Fleming and Wallace 1986; Smith 1988; Lilja 2005). Since all averaging methods are misleading at some level, there is no unambiguous solution to the problem. Whichever method is used, it is necessary to reveal all the equations and measurement values that are used in the calculations.

CPIQ standardization group has planned to tackle this problem by transforming all individual metrics to perceptual, just notable difference (JND) values. The JND values represents the quality loss of each metric. Finally, the final benchmarking score, which represents the total quality loss of the system, is calculated using a multivariate equation expressed by Keelan. (P1858 2015; Keelan 2002)

5.3.5 Practical issues of benchmarking

The measurements and benchmarking may face several practical difficulties, which are not related to the standardized metrics as such. It is obvious that results will differ between phone models of a phone vendor, but there can be clear differences, even if the model name is same. A good reference for this issue was Samsung Galaxy S4, where two different models were sold using the same name, S4, but S4

GT-I9500 had Samsung’s own Exynos chipset whereas S4 GT-I9505 was powered by a Snapdragon chipset. Even though the camera modules were same, the performance of the Exonys version was significantly better than the other.

Even if the hardware content is exactly same, the software version may yet affect the camera quality. During the measurements for the third article, two different software versions of the Lumia 1020 model gave clearly different results (Peltoketo 2014). Moreover, there can be differences between individuals. For example, slightly different mounting of a lens system towards a sensor may generate clear problems in the resolution. Even if the worst cases are removed during the factory validation, some variance will remain.

If measurements are made automatically, i.e. a test system both captures required images or videos and calculates the results, the automation requires a software interface with the camera. Mobile phone vendors offer a public interface, but sometimes they have also their own, proprietary and hidden interfaces, which work better with their own applications. This leads to a situation, where third party benchmarking will have different results than a method, which could use proprietary interfaces. Especially in speed and performance tests, the optimized and proprietary interface may improve results significantly.

5.3.6 Static benchmarking, compatibility requirement or trap?

When mobile phone cameras are compared with previous versions, the benchmarking score and corresponding metrics should be compatible. The easiest way to guarantee this requirement is to make a static benchmarking system, which has constant metrics and the final score is always calculated in the same way.

However, this can be also a trap. If the benchmarking score and corresponding metrics are not updated, the score will become sooner or later out of date and it will no longer be valid. Even if the main quality metrics like colors, noise and sharpness do not change, the image processing algorithms are changing. The result may show that the quality fundamentals are in a good shape, but some other artefacts have appeared. Good examples of the phenomenon are the denoising and over sharpening artefacts. New camera models will also offer new features, which have to be validated and compared between models.

According to the Recon Analytics, the average lifespan of mobile phones varies between countries. In the United States it is as short as 22 months, whereas in India it is over seven years (Recon Analytics 2011). The variance forces benchmarking

systems to be flexible, new features have to be taken into account but also the compatibility with old models has to remain.

In document Benchmarking of mobile phone cameras (sivua 95-101)