Cosmetic grading of refurbished smartphones : A machine learning approach

(1)

COSMETIC GRADING OF REFURBISHED SMARTPHONES

A machine learning approach

Master’s Thesis Faculty of Engineering and Natural Sciences Examiner: Prof. Moncef Gabbouj, D.Sc. Niko Siltala May 2021

(2)

ABSTRACT

Juho Pastell: Cosmetic grading of refurbished smartphones Master’s Thesis

Tampere University Mechanical Engineering May 2021

Essential aspect affecting the reselling price of a refurbished smartphones is its cosmetic condition. Sold devices are commonly categorized to different cosmetic grades, that are used to indicate the level of previous usage visible on the phone. The categorizing is mainly done manually, which makes the process error-prone and hard to do repeatably.

OptoFidelity SCORE is a tester designed to perform automatic grading of refurbished smartphones. This thesis aims to solve, how automatic grading could be performed based on image data captured with SCORE tester. The solution is required to be trainable with examples and no traditional logic should be used in the grading.

Solution for the problem is studied with a mixed approach. First a literature review is conducted, where existing techniques used for detecting surface defects are examined. Most suitable approach is then selected for implementation. The implemented system utilizes Canny edge detection algorithm and improved LeNet-5 convolutional neural network to detect dust, grease, and scratch type defects from the device surface. This defect information is used to construct a representation of the device, that a classifier model uses for predicting the grade.

Performance of the system is evaluated with an experimental approach. Implemented grading system is used for grading physical devices, for which the true grade is assigned by human operators. Grading is done with two classifiers: random forest and support vector machine, and the results are compared. Based on experimental results, by using a support vector machine, the system was able to correctly classify devices with over 95% in accuracy, precision and recall.

All the devices used in the experiments were black iPhone 7 devices. The grading was done based on defects found from front surface of the smartphone. Since only limited number of devices were used in the evaluation no generalizing conclusion about the performance could be made.

More research and studies with broader set of devices are needed, to implement a system, that could surpass human based grading.

Keywords: refurbishment, smartphones, grading, machine learning, computer vision, LeNet-5, Canny edge detection

The originality of this thesis has been checked using the Turnitin OriginalityCheck service.

(3)

TIIVISTELMÄ

Juho Pastell: Diplomityö Diplomityö

Tampereen yliopisto Konetekniikka Toukokuu 2021

Kunnostettujen puhelimien jälleenmyynti arvoon vaikuttaa keskeisesti niiden kosmeettinen kun- to. Myydyt laitteet kategorisoidaan usein eri laatuluokkiin, jotka indikoivat edellisen käytön näky- vyyttä laitteessa. Tämä luokittelu tehdään pääsääntöisesti manuaalisesti. Prosessin ollessa yksi- toikkoinen, virheellisten luokittelujen riski on suuri.

OptoFidelityn kehittämä testilaite SCORE on suunniteltu käytettyjen puhelimien automaatti- seen laatuluokitteluun. Tämä diplomityö pyrkii selvittämään, kuinka automaattinen laatuluokittelu voitaisiin toteuttaa, SCORE:n ottaman kuvadatan perusteella. Vaatimuksena on, että systeemi tulee voida kouluttaa tekemään luokittelu, eikä perinteistä logiikka luokittelussa tulisi käyttää.

Ratkaisua ongelmaan selvitetään monimenetelmällisesti. Työssä tutustutaan ensin olemassa oleviin kuvasta tehtäviin viantunnistus tekniikoihin. Eri tekniikoiden soveltuvuutta arvioidaan puhelimien laatuluokittelun näkökulmasta, ja parhaiten tilanteeseen sopiva ratkaisu toteutetaan. To- teutettu järjestelmä hyödyntää Canny reunantunnistus tekniikkaa ja paranneltua LeNet-5 neuro- verkkoa viantunnistuksessa. Tunnistetut vikatyypit ovat naarmu, pöly ja rasva. Vikatunnistuksista koostetaan ominaisuusvektori, jonka pohjalta lopullinen laatuluokittelu tehdään koneoppimismal- lilla.

Järjestelmän suorituskyky arvioidaan kokeellisin menetelmin. Kokeessa järjestelmää käyte- tään fysikaalisten puhelimien laatuluokittelussa, joiden todellinen laatuluokka on määrätty ihmis- ten tekemän luokittelun perusteella. Kokeessa vertaillaan tukivektorikoneen ja satunnaismetsän tekemää luokittelua toisiinsa. Kokeellisten tulosten perusteella, käyttämällä tukivektorikonetta to- teutettu järjestelmä kykenee luokittelemaan laitteet yli 95% tarkkuudella.

Kaikkiaan työssä käytetään 57 puhelinta, jotka kaikki ovat mustia ja iPhone 7 mallisia. Tutki- muksessa käytettyjen puhelimien lukumäärän ollessa pieni, saavutettuja tuloksia ei voida yleis- tää. Lisätutkimuksia tarvitaan suuremmalla puhelin määrällä, jotta ihmisen veroinen automaattinen luokittelujärjestelmä voidaan toteuttaa.

Avainsanat: kunnostaminen, älypuhelin, laatuluokittelu, koneoppiminen, konenäkö, LeNet-5, Can- ny reunantunnistus

Tämän julkaisun alkuperäisyys on tarkastettu Turnitin OriginalityCheck -ohjelmalla.

(4)

PREFACE

First of all, I want to thank OptoFidelity for providing me a chance to finish my studies with an interesting Thesis topic. During the work, I have learned a lot, and I am sure that the time spent with the Thesis will pay itself back, as the lessons and skills are put into practice for future challenges.

I especially want to thank Adeel Waris who has supported and guided me throughout the whole research process. I also want to thank Nikhil Pachhandara, Hans Kuosmanen and the whole refurbishment team for their contributions and help. From the university’s side, I want to thank Prof. Moncef Gabbouj and D.Sc. Niko Siltala for helping me reach the finish line.

Lastly, I want to thank my friends and family for their support during my studies. Special thanks to Ninni, for being there when most needed.

In Tampere, 5th May 2021 Juho Pastell

(5)

LIST OF FIGURES

2.1 Relationship of the model complexity to the bias and variance [25]. . . 10

2.2 SVM learning scenario. . . 11

2.3 Mapping of data from input space to feature space. . . 12

2.4 Effect of changing values ofγ andCparameter in SVM utilizing RBF kernel function [29]. . . 13

2.5 Decision tree model. . . 14

2.6 Five-fold cross validation where the data is split to equally sized partitions that are used for training and testing. . . 16

2.7 Nested cross-validation process. . . 17

2.8 Confusion matrix of multi-class classification. . . 18

2.9 Feedforward network, and functionality of the neuron [35]. . . 20

2.10 Convolution process for images. . . 21

2.11 Working principle of convolutional layer. . . 22

2.12 Max-pooling operation. . . 23

2.13 Backpropagation process, where L is the loss function, y is the real response,a_lis the activation in the layerl.z_lrepresents a node andw_l,al−1, andblare the inputs to the node. . . 24

3.1 Automatic grading machine SCORE. . . 30

3.2 Steps for automatically grade smartphones with SCORE tester. . . 31

3.3 Image taken with SCORE. . . 31

3.4 Comparison of clean and defected device. . . 32

3.5 The process of extracting non-uniform regions from the image. . . 33

3.6 Detected defects. . . 33

3.7 Architecture of LeNet-5, developed by Yann LeCun in 1998 [65]. . . 34

3.8 Defect detection data. . . 35

3.9 Feature construction process. . . 36 4.1 Distribution of ground truth grades of the devices used in grading evaluation. 40

(8)

LIST OF TABLES

3.1 Comparison of different approaches for the grading process. . . 28

3.2 CNN used for defect classification. . . 35

3.3 Statistical features calculated from the defect detection data. . . 36

3.4 Description of grade categories. . . 37

4.1 Number of defect samples before and after augmentation. . . 39

4.2 Comparison of LeNet-5 and improved LeNet-5 model in defect classification. . . 39

4.3 Confusion matrix generated from predictions made with improved LeNet-5 model. . . 39

4.4 Feature combinations used in grade classification evaluation. . . 41

4.5 Hyperparameters used for optimizing SVM. . . 41

4.6 Grading results of achieved with SVM. . . 42

4.7 Grading results of achieved with Random forest. . . 42

(9)

LIST OF SYMBOLS AND ABBREVIATIONS

D Dataset

E Expected value

IG Information gain L Loss function α Learning rate ṗ Empirical probability γ Spread of SVM kernel σ Activation function σϵ Expected value

θ Internal parameter of a learning model φ Mapping function

b Model bias

f Learning model

s Split

w Model weights

x Input

y Output

Bagging Bootstrap Aggregating

C Regularization parameter used for SVM CNN Convolutional Neural Network

FN False Negative FP False Positive

GLCM Gray level co-occurrence matrix IDC International Data Corporation LBP Local binary pattern

OOB Out-of-bag

RBF Radial Basis Function SVM Support vector machine TN True Negative

(10)

TP True Positive

(11)

1 INTRODUCTION

Refurbishment is the activity of collecting and restoring used devices in order to sell them to new customers. Popularity of refurbished smartphones has increased in recent years. New companies selling purely refurbished devices have entered the market and companies formerly selling new smartphones have included refurbished alternatives to their catalog. A global market research company IDC has predicted that the shipments of refurbished smartphones will continue to increase and reach 350 million units in year 2024. This would account for a compound annual growth rate of 11.2% from year 2019 to 2024 [1].

Reasons explaining the growth in refurbished smartphone market relate to the shift to- wards greener economy practices and the lack of innovations in the field [2]. Buying a refurbished smartphone can be seen as an eco-friendly and socially responsible action, since smart phones are one of the most resource intensive products to manufacture in terms of energy use per weight [3, p. 15]. By choosing a refurbished smartphone, the used energy and labor are preserved the and waste is reduced. Also, investing into a new smartphone might not be so appealing, if disruptive changes between a model and its successors are compared. On top of all the price for refurbished device is usually considerably lower compared to a new device.

A major factor affecting to the selling price of a refurbished device is the cosmetic condition. Usually companies selling refurbished smartphones assign each device into a distinct grade class that indicates the level of usage visible on the phone. The grading is usually done manually which makes the process prone to errors and hard to do repeatably. This has led to a need for an automated grading solution that would decrease the errors and improve the repeatability of the process.

One solution to automate the grading process is developed by OptoFidelity, which is a Finnish company specializing in robot assisted testing and measurement solutions.

OptoFidelity SCORE is a fully automated test system for secondhand smartphone grading. [4]. This thesis has been made in co-operation with Optofidelity and aim of the research is to help future development of the SCORE product.

1.1 Research methodologies, objectives and questions

The problem this thesis is trying to solve, is automatic grading of refurbished smartphones. The objective is to develop a solution that can perform the device grading by

(12)

analyzing images taken with the SCORE tester. The solution is required to be trainable and to perform the grading based on learned examples. In addition the solution should not be completely opaque in terms of the final grading resolution. This phenomenon is know as the black-box problem [5].

The research questions for the thesis are:

1. How defect features could be extracted from the images taken with SCORE?

2. What is the grading performance of the developed grading implementation?

The first research question is studied with a mixed approach. A literature review of the existing applications is first conducted. The found solutions are then examined in terms of their appropriateness to the task at hand by comparing different aspect about the techniques. Most suited approach to be used for defect feature extraction is then selected and used for implementing the grading pipeline.

For the second research questions, and partly for the first, an experimental research is conducted. Experimental research is essential when machine learning techniques are studied, because the learning models used in the systems are too complex to be analyzed formally [6]. In addition, the fundamental performance of a machine learning model is determined by how well it’s underlying assumptions match the structured world.

Only way to evaluate this is to make use of data set that are gathered from real-world applications. [7]

In experimental research, one or more independent variables are varied and the effect to a dependable variable is examined. In machine learning research the dependant variables are the performance measures. The broad classes of independent variables are aspects of the algorithm and aspects of the environment. In order the evaluate the performance of a machine learning model, it is crucial to divide available data into training and test set. The training set is first presented to the learner and the evaluation is done with a test set. Reason for this is that the goal of learning is to use the acquired information to aid the behaviour in novel situation. The evaluation should then be done over multiple iterations with many sets of training and test samples randomly selected from the data.

[6]

The experimental research in this study is conducted by using the implemented grading system to grade physical devices and then evaluating performance of the solution. The evaluation is done by comparing grading performance of two different classifier models.

The information acquired with literature review, will be used to guide the selection of these models. The classifiers are trained and tested with different combinations of available features, during multiple iterations. For the evaluation, multiple metrics are used. With this process the effect of varying independent variable to the dependable variable can be examined.

The secondhand smartphones used in this study, are divided into two sets. One set is used for developing the defect detection pipeline, and the other is used for evaluating the

(13)

grading performance, achieved with the implementation. The practice here follows the methodology of dividing data into training and test set, since using same data for developing the pipeline, and for the grading would not be a novel situation. This is because the feature extraction pipeline could be optimized for the available data, and it would give overly optimistic estimate about the actual defect detection performance. Ultimately this would lead to overly optimistic estimate for the actual grading performance also [8, p. 228].

The approach for solving the research questions combines both qualitative and quantitative techniques. Analyzing existing techniques and their appropriateness to the grading task is only done based on observations. This methods is known to be qualitative. Anal- ysis of the final grading results on the other hand leans to the direction of quantitative research. This is because the conclusion, if implemented detection algorithm could be used for the grading, is based on numerical results. [9]

1.2 Restrictions of the study

In total 57 secondhand devices were used in this study. The study was limited to cover only one smartphone model, in order to control the complexity of the grading scenario.

All the devices were black iPhone sevens. One third of the devices (20) were used for developing the defect detection pipeline. Rest of the devices were used in the grading evaluation. The total number of phones used was limited, as more devices were not available during the experiments.

The grading performed in this study is limited to be based on defects found from the front surface of the devices. The defect types detected with the developed detection algorithm are dust, grease and scratch. These defect categories were selected, since they were most common defects found from the used smartphone surfaces.

In this study the capability of the defect detection algorithm is not evaluated individu- ally. That means that the performance for detecting a particular defect is not measured.

Instead evaluation of the defect detection capability is incorporated in the grading evaluation.

The ground truth grades for the devices were assigned by human operators. The operators used in this study were not professionals in grading. Professional graders were not used, since consulting them would have increased cost of the study and delayed the development process.

1.3 Thesis outline

This thesis starts with a theoretical background chapter. In the background chapter cosmetic grading process is first presented. After that the essential methodology used in machine learning, and the related techniques of deep learning are presented. In the last part of theoretical background, previous studies and research related to the defect

(14)

detection and grade classification are discussed.

The theoretical background is followed by chapter presenting the implemented automatic grading process. The implementation chapter begins with an evaluation of the existing techniques and approaches to be used in the grading process. Reasons behind the final implementation design is presented, and argued why it was the most appropriate solution to the grading task at hand. Before the details about the actual implementation, the development process of the grading pipeline and the evaluation methods of the study are also presented.

The implementation choices are justified not only by evaluating existing techniques, but with performing evaluation with actual devices imaged with the SCORE tester. Results for supporting the implementation choices, and for evaluating the whole grading process are presented in the chapter four.

After the experimental results, discussion about how the research questions were an- swered and what the results indicate are presented. In discussion limitations of the study and recommendations for future work are also presented. This work ends with a conclusion chapter that summarizes the key aspects of the study.

(15)

2 THEORETICAL BACKGROUND

This chapter presents the essential background for the study. First the refurbished smartphones and the cosmetic grading process is introduced. After that the related machine learning and deep learning concepts are presented. In the last part of this chapter, previous work related to the automatic smartphone grading is illustrated.

2.1 Refurbished smartphones

Refurbishment is the process where the second-hand smart-phones are collected, re- paired and restored in order to sell them to new customers [10, p. 9]. Companies selling refurbished smart-phones usually execute a series of evaluation, maintenance and repair steps for the device before it is sold to the customer [3]. Comprehensive evaluation and testing steps are crucial in order to provide a warranty for the device and to meet the customer’s expectations of baseline performance [2]. Testing also guides the refurbishing process in terms of allocating the repair and maintenance actions that are carried out for the device.

2.1.1 Cosmetic grading

Companies selling refurbished phones usually assign the products into a grade that re- flects the cosmetic condition of the device. The grades usually range from pristine or like-new condition to heavily used and worn. Similar phones belonging to different grade class are then sold at different price, the worn ones with cheaper and the ones in good cosmetic condition with higher.

There is no requirement for a company to follow a certain standard or a guide concerning the cosmetic grading of the devices. However, there is a standard for grading pre-owned wireless devices created by CTIA, which is a trade association representing the wireless communications industry in the United States [11]. In the CTIA standard, the devices are distributed based on the defects found from the different surface areas to five different grade classes. These classes are; A, B, C, D and E, A being the best and E being the worst. The descriptions for each grade class are explicit. The defects are classified to different severity levels based on their size. For each grade class allowed number of defects from different grade classes are then defined and the final grade is determined based on these limits.

(16)

Companies doing the cosmetic grading usually have their own set of criteria that deter- mines types of defects, that results in certain grade for the device. The number of used grades also differs between different companies, but usually three to five grade classes are used [12][13][14]. The standard practice in the field is that the grading of the device is made by a human operator. The operator might use illumination and magnification tools to help spotting the defects in the device. The defects might be measured in terms of area or depth, if the used grading criteria has a explicit definitions for the requirements of different grades. In a case where explicit requirements are not listed, the grading resolution is based on the operators view of the most appropriate grade.

Accurate numbers of the time it takes for a human operator to inspect a device and classify it are not available. The time depends on the expertise of the operator and the extent of the inspections performed on the device. Statistics about the number of devices one operator grades in one day, are neither publicly available.

2.1.2 Problems with manual grading

Grading smartphones manually is laborious, repetitive, and error-prone process. To grade smartphone correctly, the device needs to be carefully inspected. It is natural that after a while, focusing on a tiresome task will become harder and harder. As the operator’s ability to focus on the task decreases the number of unnoticed defects and incorrect grading decisions increases.

With proper ergonomics, grading the smartphones manually is executable. However, a fundamental issue relating to the manual grading is the repeatability of the process. In a case where grading follows explicit definitions of the defect characteristics, the repeatability of the process depends on the operator’s ability to spot same defects time after time. In a case where more loose definitions for the different grades are used, the given grade is based on operator’s subjective view of the most appropriate grade. As the operators doing the manual grading come from myriads of backgrounds the grading resolutions given by different operators might differ a lot, making the whole grading process unrepeatable.

2.2 Machine Learning

Machine learning can be defined as computational methods that allows computers to learn how to perform a given task without being explicitly programmed to do so. Machine learning solutions are used in many different fields and there are several different types of scenarios where machine learning can be applied. The learning problem related to this thesis is a classification task and it falls under a supervised learning scenario. In this section the essential theory and concepts related to aforementioned subjects is presented.

(17)

2.2.1 Supervised learning

Supervised learning is the most common form of machine learning [15]. The prerequisite for solving a supervised learning task is to have a dataset, denoted as

D={(x_i,y_i), ...,(xn,yn)}, (2.1) wherex_i is the input value andy_i the corresponding output value. The input is usually a vector, and the corresponding output is called a label.

The goal of a supervised learning scenario is to find a model f that can approximate the underlying relationship between the input and output values [8, p. 28]. The model is simply a function mapping input values into outputs and this process can be denoted as

f(xi)≈yi. (2.2)

The model is inferred during a training process, where inputs and the corresponding output values are shown to the learning algorithm. During the training the internal parameters of the model are updated so that the difference between the predictions made by the model and the true output value is minimized. The process of minimizing the difference between the prediction and the output is discussed more in depth in section 2.2.3. After the training process the inferred model can be used to predict labels for unseen input values. Ability of making correct predictions for new unseen data is called the generalization and it is discussed in section 2.2.4.

Classification is a type of supervised learning task, where the goal is to assign each input into a discrete category [16, p. 3]. The task where the inputs are mapped into two different categories is called a binary classification. Commonly used example of binary classification task is predicting if a e-mail is spam or not. The situation where the number of possible classes is greater than two, is called a multiclass classification. An example of multi-class classification would be a task of assigning input images of different vehicles into categories like train, car or ship.

2.2.2 Features

In machine learning, the attributes and values that are used to describe data objects are called features [17, p. 1]. The features are needed for distinguishing and characterizing different groups and objects. The informative features are the basis of machine learning, since the trained model can be only as good as its features [18, p. 13].

The main types of features are quantitative, ordinal, and categorical. The quantitative features are expressed with a numerical scale, for example a measurement addressing person’s weight. The ordinal features have no scale but can be ordered, for example a

(18)

grade achieved in an exam. The categorical features don’t have either of the attributes that the two other feature types have. Example of a categorical feature could be the sex of a person. [18, p. 304]

The process of transforming a given set of input features into new features is called feature construction. The main goal for feature construction is reducing the dimensionality and improving the prediction performance of the model. Common way to construct new features from given features is to apply adding, subtracting, or multiplying operations to original features. Different statistical features such as counts or mean, and variances can also be computed from the original features. [19]

2.2.3 Loss functions

As stated in the section 2.2.1, the goal for a supervised learning task is to find an optimal model that minimizes the error between the prediction and the true responses. The function that evaluates the before mentioned error is called the loss function and the objective of finding the optimal modelf can be denoted as

minθ

1 N

N

∑︂

i=1

L(y_i,f(x_i, θ)), (2.3)

whereN is the number of training samples, θis the internal parameters of the model f and L is the loss function. There are many types of loss functions and the selection of which one is used depends on the model used and the task at hand.

The simplest loss function to use for classification would be the zero-one loss, where the number of correct classifications is compared to the number of misclassifications.

The zero-one loss function is straightforward to implement but it does not quantify the magnitude of the error for a given prediction. [20]

In order to take the magnitude of the prediction error into consideration, a logistic loss function is commonly used for classification tasks [21]. For a multiclass classification problem the logistic loss is denoted as

−

N

∑︂

i=1 K

∑︂

k=1

yiklog(yˆik), (2.4)

where N is the number of training samples, K is the number of classes, yi_k is either zero or one as the true values are decoded into one-hot numeric array, and theyˆikis the probability for the prediction for the sample to be at given class [22, p. 209].

Minimizing the cross-entropy loss with respect to the model parameters is convex function [20]. A common method to solve convex optimization problem is a gradient descent, which is a iterative algorithm where the models parameters are updated to opposite direction of the gradient of the loss function [23]. The gradient in general context tells the

(19)

direction of the greatest increase in the functions value. It can be calculated by taking partial derivatives with respect to the parameter values denoted as

∂L

∂θ =

⎡

⎢

⎣

∂L

∂θ1

...

∂L

∂θn

⎤

⎥

⎦

. (2.5)

The update rule for the gradient descent is denoted as

θ←θ−α∂L

∂θ, (2.6)

whereαis the learning rate that controls the speed of the learning process [20].

2.2.4 Generalization

Generalization is the most fundamental concept in all of machine learning applications.

Generalization is the term used to describe the models ability to make correct predictions for data it has not seen before [16, p. 7]. In a supervised learning scenario, there is a finite set of labelled examples that the model can learn from and base the future predictions.

If the model is able to perfectly learn the training data and predict correct labels for all samples, the case might be that the model has memorized the data and not generalized to it. This is important to keep in mind as the models performance is evaluated, since a good prediction performance on the training data, will not give good estimate about the overall generalization [8, p. 228].

The situation where the model has high prediction performance on the training data but has a low prediction performance for the unseen data, the model is called to be overfitting.

A central concept relating and affecting overfitting is the model complexity, which relates to the models ability to adapt to the data [24].

Best way to illustrate the relationship of the model complexity and generalization performance is to discuss about the bias-variance trade-off. The generalization error is mathe- matically modelled as the expected prediction error. The expected prediction error for an unseen example from the training setDwith a squared loss is denoted as

E[(y−f(x;D))²] = (y−E[f(x;D)])²+E[f(x;D)−E[f(x;D)]²] +σ²_ϵ, (2.7) whereEindicates the expected value,yis the true response value of the prediction,f(x) is the prediction made by the modelf andσ_ϵis the irreducible error [18, p. 93], [8, p. 37].

The first two components on the right side of the equation are called

Bias= (y−E[f(x;D)])², (2.8)

(20)

and

Variance=E[f(x;D)−E[f(x;D)]]². (2.9) The irreducible errorσ_ϵ² represents the immeasurable variation that is affecting the data.

From the equation 2.7 we can see that in order to minimize expected prediction error we need to minimize the bias and variance. The bias term tells how far off the model’s average prediction is compared to the real response. The variance on the other hand tells about the variability of models prediction if a different training set were used.

The phenomenon discussed above is illustrated in figure 2.1 where three models with different levels of complexity are fit to samples drawn from a cosine function [25]. The models are linear regression estimators and the model complexity is related to the degree of the estimator. The plot on the left illustrates a model that has low complexity, and the predictions are affected by high bias. The plot on the right illustrates a model with high level of complexity and thus predictions affected with high variance. The plot on the centre illustrates a model that is not perfectly fit to the training data but on average has a low prediction error, hence the optimal model.

Figure 2.1.Relationship of the model complexity to the bias and variance [25].

2.2.5 Support vector machine

Support vector machine (SVM) is a type of geometric model, that constructs a decision boundary by using lines, planes and distances [18, p. 194]. The decision boundary is constructed by finding data points, also called as support vectors, that maximizes the margin between the different classes [26]. Figure 2.2 illustrate a learning scenario where SVM is used to separate data with two dimensional features that are linearly separable. The circled points are the closest data points that define the support vectors. The support vectors are illustrated with dashed lines. The orange line between the support vectors is the decision boundary, facing the direction that maximises the marginsd1 and d2 between the boundary and the support vectors. In higher dimensions than two, the

(21)

decision boundary is called a hyperplane.

Figure 2.2.SVM learning scenario.

For a case where the input data is linearly non-separable, SVM can perform the classification by mapping the data into a higher dimensional space, and constructing the decision boundary there. Figure 2.3, illustrates a process where the input values are mapped to a feature space by applying a mapping functionφto the input values. As seen from the figure 2.3, after the mapping the data becomes linearly separable. Though the benefit of applying the mapping is evident, the data might still not be completely separable. In that case the separation of the data is done by finding a decision boundary that minimizes the number of classification errors [26].

(22)

Figure 2.3.Mapping of data from input space to feature space.

Generally, the mapping is done with kernel function. The kernel function allows performing the classification in a high dimensional feature space, without having to construct the whole space. There are different types of kernel functions and by defining different types of kernels the geometric models can be used for learning task with non-numerical data [18, p. 230]. Commonly used kernel functions are polynomial, radial basis function (RBF) and Sigmoid [27].

When SVM is used in for prediction, the performance is highly dependant of the chosen kernel function and the related parameters [28]. Effect of choosing different parameter values related to the predictions of the model, is nicely illustrated in scikit-learn docu- mentation [29]. In a case where RBF kernel is used, the two parameter that need to be considered are γ and C. The γ parameter defines the spread of the kernel. From the figure 2.4 we can see that, low values for γ results in smoother hyperplanes, and larger γ values can make the hyperplane more jagged. The parameterC on the other hand is a regularization parameter, and it affects to the size of the decision margin, the model is trying to find. By changing the parameter values, SVM can construct different type of decision boundaries, and it will highly affect to the prediction outcome.

(23)

Figure 2.4. Effect of changing values ofγ and C parameter in SVM utilizing RBF kernel function [29].

2.2.6 Decision trees and random forest

Decision tree is a logical model that partitions the feature space by using a set of logical rules that relate to the feature values [18, p. 37]. These rules can be organized into a tree type structure that consist of nodes and branches. A visualization of a decision tree model for binary classification with three features is illustrated in the figure 2.5. The top most node is called the root node, and it responds to the feature that divides that data into correct predictions the best [30]. The bottom nodes are called the leave nodes and they represent the output of the model. The depth of the tree depends on the number of splits before the leave nodes.

(24)

Figure 2.5.Decision tree model.

The decision trees are built by evaluating the goodness of using different features for the splitting. This evaluation is done by measuring the impurity of the splits done with different features. The impurity measures the homogeneity of the classes found in different nodes after the splitting. The splits used in the model are selected so that the difference between the parent node impurity and the weighted sum of child node impurities is maximized.

The impurity of the splits can be measured with many functions, but commonly entropy or gini-index is used [30]. The formula for calculating the impurity with entropy in a multiclass scenario is denoted as

Entropy=

k

∑︂

i=1

−ṗ_ilog(ṗ_i), (2.10)

where ṗ_i is the proportion, also called as empirical probability, of the instances from different classes to end up in the node [18, p. 134]. Formula for calculatingṗi is denoted as

ṗ_class= n_class n_class+n̸=class

. (2.11)

The difference of the parent node and child node impurities are called information gain, when the entropy impurity function is used [18, p. 136]. Information gain IG for a split sthat partitions the datasetDof sizeN into datasetsD1 andD2 of sizesN1 and N2 is

(25)

calculated with the formula

IG(D,s) =Impurity(D)−N₁

N Impurity(D₁)− N₂

NImpurity(D₂) (2.12) Random forest is machine learning model that consist of a collection of tree type models.

The model is built by using a technique called bootstrap aggregation Bagging, where random samples are taken with replacement from the training data and used for training the trees. This technique results in a diverse set of trees that are each trained with different subset of the training data [31]. The diversity of the trees is further amplified by using only a subset of the available features for building the trees. The final prediction of the model is a result of averaging the different predictions or a majority vote in case of a classification task [18, p. 332].

Bagging is especially useful when it is used with tree type models. If the tree models are allowed to grow deep, they tend to have a low bias and are able to capture complex relationships between the data and the target. By averaging the predictions made by a collection of trees, the variance of the final predictions can be decreased, resulting in improved accuracy of the model [31].

2.2.7 Model evaluation

Model evaluation methods are needed to estimate prediction performance of different models. In a situation where lot of data is available, the simplest approach for model evaluation is to partition the data into training, validation and test set [8, p. 222]. Idea is to use training set and validation test set for the model training. The final estimate on the generalization is then done with a test set. In case model evaluation would be done with training and test set pairs, the generalization error could be highly underestimated [8, p. 228]. There is no general rule of how the partition should be done, but usual approach is by using 50% of the data to training, 25% to validation and 25% for testing.

More common situation in different learning scenarios is that the available data is scarce.

For these situation typical approach for estimating generalization performance of the model is cross-validation [8, p. 241]. In cross-validation the available data is first split into equally sized folds. The testing and training is then carried out by using one of the fold as testing data and the rest as training data. The final result for performance is then an average of all the training and testing iterations. This gives a good way to estimate the generalization performance since the variability in the training data and the models dependency of that data will be reflected on the cross-validation results [18, p. 349]. The figure 2.6 illustrates the data splitting procedure of five-fold cross-validation. In applications the usual choice for number of folds is five or ten [16, p. 71], [18, p. 349], [8, p. 242].

The case where the number of folds equals the number of samples in the data set is called leave-one-out cross-validation.

(26)

Figure 2.6. Five-fold cross validation where the data is split to equally sized partitions that are used for training and testing.

In cases where hyperparameter tuning needs to be incorporated to the evaluation process, the best practice is to use nested cross-validation method, presented in figure 2.7.

The hyperparameters are parameters that affect the learning process of the model, but are not directly learned during the training process. In nested cross-validation the data is split into inner and outer folds, and the optimal set of hyperparameters are tuned at each trial [32]. Nested cross-validation process is presented in figure 2.7.

(27)

Figure 2.7. Nested cross-validation process.

2.2.8 Evaluation metrics

Evaluation metrics are needed for evaluating and comparing different machine learning models. These metrics are deduced by comparing the predictions made by the model and the actual responses. The different metrics are suitable for different situations, as they put emphasis on different characteristics in the results.

Many metrics used for evaluating machine learning models are based on the confusion matrix. A confusion matrix is table that records the number of correct and incorrect predictions for samples from different class categories [33]. The values listed on the diagonal are called the true positive (TP) values. The true positive value is a count for how many times the model predicted the class correctly. The values outside the row and column for each class are called the true negative (TN) values. The true negative is a count for how many times the sample that was not from a given class was predicted to another class.

The two other values extracted from the matrix are false positive (FP) and false negative (FN) values. The false positive value tracks the number of how many times samples from different classes were falsely predicted to be of a given class. In contrast to false positive, the false negative counts how many times a sample from given class was falsely predicted to be from another class. The figure 2.8 aims to clarify the contents of a confusion matrix.

(28)

Figure 2.8.Confusion matrix of multi-class classification.

Common metrics computed from the confusion matrix are precision, recall and accuracy.

Accuracy measures the probability for the model to make a correct prediction. It is calculated by dividing the number of correct prediction with all of the entries in confusion matrix. The formula for accuracy is denoted as

Accuracy= TP+TN

TP+TN+FP+FN. (2.13)

Precision estimates the trustworthy of the model, in terms of correctly predicting a sample for a given class. Precision is calculated by dividing the number of true positive predictions with total number of positive predictions, denoted as

Precision= TP

TP+FP. (2.14)

Recall measures the models ability to find all the positive units in the data set. It is calculated by dividing the number true positive samples with the total number of units in the class. The formula for recall is denoted as

Recall= TP

TP+FN. (2.15)

2.3 Deep Learning

The traditional machine learning techniques had limited capabilities to process data in a raw format. Instead, to achieve good performance, the traditional learning models, such as SVMs and decision trees, need features that are constructed manually to represent

(29)

the characteristics of the data. Compared to the traditional machine learning models, deep learning models can learn these representations automatically during the training process [15].

The deep learning is essentially a representation learning method that brakes the compli- cated mapping function into a series of nested simple mapping functions [34, p. 6]. The mapping is performed by a series on nonlinear transformations, that allow model to construct highly abstract concepts out of simpler representations. Common architecture for deep learning models is an artificial neural network, discussed in the following section.

2.3.1 Artificial neural network

Artificial neural networks are computational models, originally inspired by the operating principle of the biological brain [34, p. 165]. The basic building block of a neural network is the processing unit, sometimes called as artificial neuron or node, that computes a linear function that is followed by a nonlinear activation function. The neurons are typically organized in sequential layers, where output of the neurons in the preceding layer works as an input for the neurons in the following layer. The layer where the neurons are fully pairwise connected to the preceding layer is called a fully connected layer. The first layer of the network is called an input layer, and the last one is called an output layer.

The layers between the input and output layers are called hidden layers. The number of layers is known as the depth of the network.

A type of a network where the information flow goes only to one direction is called a feedforward network [34, p. 164]. The information flows through all the layers, where each of the nodes computes an activation value denoted as

a=σ(∑︂

j

w_jx_j+b), (2.16)

x_j are the inputs to the unit, w_j are the weights, b is the bias, and σ is the nonlinear activation function. Common activation function used in neural networks is the rectified linear activation function [15]. Rectified linear activation function or ReLU for short is denoted as

fReLU= max(0,x). (2.17)

In a feedforward network activation values of preceding layers work as an input for the following layers. Finally output layer maps the activation of the second last layer into a prediction. The figure 2.9 illustrates an feedforward artificial neural network and the inner-workings of a neuron.

(30)

Figure 2.9.Feedforward network, and functionality of the neuron [35].

2.3.2 Convolutional Neural Network

Regarding image processing, convolution is a process of sliding a convolution matrix, also called as kernel, over the image to calculate new pixel values. The new pixel values are computed by adding together the local neighbours of the corresponding pixel, weighted by the kernel coefficients [36, p. 98]. By engineering the kernel coefficients, the convolution can be used for many different tasks, such as blurring and edge detection. Illustration of a convolution process with Sobel operator is presented in figure 2.10.

(31)

Figure 2.10.Convolution process for images.

Convolutional neural networks (CNNs) are the leading architecture for image recognition, classification and detection tasks [15]. The general architecture of CNN’s consists of convolutional, pooling, and fully connected layers that are stacked together to form a deep network [37]. In convolutional neural networks, each convolutional layer is using certain number of kernels to extract features from the input data. The outputs of these kernel convolutions are called feature maps. The convolutional layer is composed of several feature maps, as multiple features are extracted from the same locations [38]. Since the type of features the convolution extracts depends on the kernel weights, these weights are updated during the training process, as the model learns what type of features are meaningful in order to make correct predictions [20]. The complexity of the detected features increases as the data flows through the network. The kernels on the first layers are sensitive to detect low level-features such as edges and curves. In the following layers the kernel convolution is producing feature maps with activation that represent more abstract features such as shapes and textures [39]. Figure 2.11 illustrates working principle of a convolutional layer.

(32)

Figure 2.11.Working principle of convolutional layer.

Usually in addition to convolutional layers, the network has pooling layers that reduces the spatial resolution of the feature maps. The pooling is performed by replacing values in a feature map by a summary statistic of the nearby output. Commonly used pooling operations are max-pooling and average-pooling [40]. By reducing the spatial resolution, the extracted features are more invariant to translations of the input [34, p. 336]. The pooling operation also helps to regulate the complexity of the network and improves the generalization ability of the model [37]. A max pooling operation is illustrated in figure 2.12.

(33)

Figure 2.12.Max-pooling operation.

In addition to pooling operation, dropout method can be used for improving the generalization ability of the model. Dropout is a technique where units and their connections are randomly dropped from the network during the training phase. This technique bring noise to the training process and forces the model to work with inputs from a randomly chosen sample of other units. This process should make the unit more robust and encourage it to making more useful features on its own [41].

2.3.3 Backpropagation

With neural networks the most common way to update the weights in the nodes is to use a method called backpropagation together with gradient descent algorithm [15]. The backpropagation is a practical application of the chain rule of derivatives. The derivatives used for the training the model, as described in 2.2.3, can be calculated for neural networks by working backwards by first calculating the gradient of the loss function with respect to the output layer. Since the parameters and the activation of the preceding layer affect to the output activation, the final gradient of the loss function is calculated by applying the chain rule of derivatives trough out the whole computational graph as illustrated in figure 2.13.

The direction of the computations is opposite to the feedforward direction and that is the reason for calling this process backpropagation.

(34)

Figure 2.13. Backpropagation process, where L is the loss function, y is the real response,a_lis the activation in the layerl.z_lrepresents a node andw_l,al−1, andb_lare the inputs to the node.

The result of backpropagation is the gradient of the loss function that describes how much the network parameters should be changed to maximize the increase of the loss function.

Since goal of the learning is to minimize loss function, the network weights are updated to the direction opposite to the gradient. This method is called the gradient descent update rule and it is described in the section 2.2.3

2.4 Related studies

The main topics in this thesis are the defect detection and the grade classification. In this section research from the industry and academia related to these topics are presented.

For both topics lot of research material is available and similar type of applications can be found at least from manufacturing and agricultural industry.

2.4.1 Defect detection

Vision based defect detection applications are commonly used in quality control of manufacturing industry [42][43]. Typically, the task is to detect different types of defects in the surface of the material. Multiple different techniques are used for extracting and detecting defects. The used approach depends on the type of the material and the characteristics of the defect. Statistical methods like co-occurrence matrices (GLCM) and local binary

(35)

patterns (LBP), and structural methods like morphological operations and edge detection are just some of the examples of the variety of different defect detection techniques [44][45].

The defect detection process at hand can be seen as an object detection task. The object detection task is not limited to classifying instance on image level. Instead, the goal is precisely localizing and conceptualizing the object contained in each image. The traditional object detection pipeline can be divided into three stages: informative region selection, feature extraction and classification. [46]

Traditional methods for preforming object detection involve an hand-crafted feature ex- tractor and a classifier model. A well know example of a traditional method is to use histogram of oriented gradients together with a SVM to detect humans from an image [47].

Hybrid approaches for object detection, that combine traditional computer vision and deep learning can employ the advantages of both systems. The classic computer vision algorithms are well-established and efficient, while deep learning models offer accuracy and versatility. In hybrid approaches the deep neural network does not need to analyse the whole image, instead it can be applied to a small patch detected or extracted with traditional computer vision methods [48]. The combination of traditional computer vision and deep learning has been applied at least in detecting defects in metal screw surfaces, and power transformers [49][50].

The state of the art techniques for object detection use deep neural networks [46]. The existing techniques can be roughly divided into two kinds of detectors: one-stage and two-stage [51]. A popular two stage detector Faster-RCNN is performing the detection by first locating object proposals with a CNN network. From the proposed locations additional features are extracted, that are then used for correcting the proposal locations and classification [52]. The one-stage detectors, like YOLO, uses features extracted from the whole image, to predict object locations and object classes simultaneously [53]. One stage methods are known to be more time efficient and they are popular in real-time applications [51]. Training of these object detection models requires a large amount of data [54]

A notable study concerning this thesis was conducted by Parkkinen in 2020, where convolutional neural networks CNN were used to detect defected regions from smart phone display [55]. In the study, image was divided into equally sized tiles, and each tile was categorized to have defects or not. Accuracy for detecting scratches and cracks with the approach exceeded over 80%.

2.4.2 Grade classification

Grading is a type of classification task, discussed in section 2.2.1, where the objective of the analysis is to assign a grade label to the data sample. Tasks that are essentially

(36)

like the smart phone grading task, can be found at least from the food and agricultural industry. Vision-based classification have been applied in the inspection and sorting of different fruits like apples and tomatoes [56][57]. In both of the studies, a SVM model achieved highest classification performance.

The fruit grading process is similar to the smart phone grading in terms of processing pipeline. After the preprocessing steps of removing the background and non-essential parts of the objects, a feature vector is constructed from textural and geometrical features. These features provide a representation of the fruit to the classifier model. The classifier will then classify fruits to a grade class. The smart phone grading pipeline presented in this thesis, differs from the prementioned examples in terms of feature extraction. Features extracted for the smartphone grading were defect area data instead of textural features, that were used for grading the fruits.

SVM was the best performing model in the studies discussed above. However, a sys- tematic survey conducted in 2014 comparing different classifier models in solving real world applications suggest that most likely best classification performance is achieved with random forest classifiers. In the study 179 different classifiers were compared on 121 datasets, and random forest achieved over 90% in accuracy in over 84% of the data sets. Based on these results, random forest is a valid option when problem relates to classification. [58]

(37)

3 IMPLEMENTATION

This chapter describes the techniques used for implementing the automatic smartphone grading process. The chapter begins by comparing existing techniques, that could be used for the automatic grading process. After presenting arguments supporting the selected approach, the development and evaluation process of the grading system is presented. Rest of the chapter gives detailed description of the actual implementation.

All the algorithms and processing methods were implemented with Python programming language. In addition to the built-in modules found in the Python Standard Library, third- party packages such as Numpy [59], OpenCV [60], Scikit-learn [61] and Keras [62] was used.

3.1 Comparison of possible approaches

Wide selection of approaches used for defect detection was introduced in section 2.4.1.

By analyzing strengths and weaknesses of different approaches, design for the automatic grading process was decided. Comparison of different approaches is presented in table 3.1.

Requirements for the desired training system was that it should be trainable with examples. A crucial aspect about the process is that the final solution should not be resolved by completely opaque reasoning. This phenomenon is commonly discussed as the black- box problem [5]. In the context of smartphone grading, the black box solution would be one, where hardly any information about reasons behind the grading results would be available. An example of this would be solution where the grading is based on raw images. The grade classification could be done with a CNN model, that would analyse the image and output a grade class. Just comparing the input images and the output grades, it would be almost impossible to say, which piece of information in the image caused the model to result in certain conclusion. A better approach for the solution would be one, where features used for the grading are extracted at defect level. Now at least some reasoning behind the grading and the analysis could be done, since number of detections and type of detections could be examined and compared together with the grading results. The latter approach would also allow a rule-based grading solution to be defined, in case there would be a need for one.

(38)

Table 3.1. Comparison of different approaches for the grading process.

Approach Techniques Evaluation

Raw image CNN

+ Straightforward to implement + No need for additional grading classifier

- Black box grading Texture analysis LBP +/ GLCM

+ clasifier

+ Straightforward to implement - No defect level information

- black box grading

Tile grid approach [55] CNN + classifier

+ Straightforward implementation + Manageable annotation work + Grading not complete black box - Defect level information not available

Object detection YOLO/R-CNN/...

+ classifier

+ Defect level information available + High performance + No black box grading

- High amount of annotated data required

Hybrid approach

Traditional computer vision

+ CNN + classifier

+ Defect level information available + No black box grading + Manageable annotation work

- Defect localization algorithm needs to be developed

Based on the comparison presented in the table 3.1, an hybrid approach solution was decided to be used in the grading. A hybrid approach was selected, since annotated defect data was not available, to be used for training more sophisticated object detection network. Also, annotation work was considered to be too time consuming. In addition, the defects are searched from a plain background, and common characteristics with them, are that they show up as non-uniform regions on the image. This made it tempting to use traditional computer vision techniques for localizing the defects.

Implementation of the hybrid approach included training a CNN model. Training a CNN model also requires annotation work, but that could be done on single image level. For the object detection network, the annotation needs to cover defect class, and the coordinates on the image. This makes the annotation work more laborious, than compared to the annotation work required for a hybrid approach.

3.2 Development process and evaluation methods

For developing the defect detection algorithm, 20 secondhand black iPhone 7 smartphones were used. The devices were used to gather data, that could be used for testing different defect localization techniques and classification models. The data was gathered by imaging the devices with SCORE tester. After optimal strategies for defect localization and classification was found, they were combined for the final defect detection algorithm.

The defect detection algorithm was then used to detect defects from unseen samples and the grading performance based on those detections was evaluated.

(39)

The developed defect detection algorithm was evaluated by using it in the grading process of 37 black secondhand iPhone 7 smartphones. These 37 devices were first assigned into grade class by human operators. This manual grading process is described in sections 3.3.4 and 4.2.1. After the manual grading, the developed defect detection algorithm was used to extract defect information from the devices. This defect information was then used to construct feature sets. These feature sets were used together with the ground truth grades to evaluate the grading performance.

The feature sets and the ground truth values were used to train and evaluate two different classifier models. Selection of these models was based on information found during the literature review. The selected classifier models were SVM and random forest. These models were selected, since they were found to be best performing in the studies presented in section 2.4.2.

Rest of this chapter focuses on presenting the implemented grading process. Detailed description about the developed defect detection algorithm is given in 3.3.1 and 3.3.2.

The process of constructing a feature representation from the defect detection data is presented in section 3.3.3. The final grade is determined based on the feature representation, and the classification is done with a classifier model. The grade classification process is presented in section 3.3.4. Results for the model comparisons and evaluations are presented in sections 4.1 and 4.2.

3.3 Automatic grading process

Optofidelity’s SCORE, presented in figure 3.1, is a tester designed to perform smartphone grading automatically [4]. SCORE uses multiple cameras and multi-angle lighting to highlight and detect different types of defects from the smartphone surface. The design process of the tester has been guided by the goal of building a tester capable of accurately and repeatably examining and grading the most common smartphones on the market [63].

(40)

Figure 3.1.Automatic grading machine SCORE.

Illustration of the automatic grading process is given in the figure 3.2. The process starts with a cleaning phase where all the impurities are removed from the smartphone surface. After the cleaning phase the smartphone is inserted to the tester for the imaging sequence. The imaging sequence is followed by an analysis step, where defects are lo- calized and classified from the images. Information about the defects and their type is passed on to the grading step, where this information is used to produce a grade verdict for the device. As the analysis and grading is being performed the smartphone is ejected from the tester. After the grading verdict is given, the smartphone is sorted for further post-processing. Duration of the process from pre-processing to post-processing is approximately one minute. The steps related to this study are highlighted in red.

(41)

Figure 3.2.Steps for automatically grade smartphones with SCORE tester.

SCORE is imaging the smartphones with a high-resolution camera and multi-angle lighting. The images that are used for the analysis are grayscale images with width of 3599 pixels and height of 7347 pixels. The scope of this thesis is limited to analysing and grading the devices based on the images taken from the front side of the device. Example image used for grading the device is presented in figure 3.3

Figure 3.3.Image taken with SCORE.

(42)

3.3.1 Defect localization

Figure 3.4 presents images of two different devices, one with defects all over the screen and bezel area, and one clean device with no visible defects. When we examine the two images one can see that the common characteristic between defects and impurities is that they are show as non-uniform regions in the image. The non-uniform region is a region where the brightness value of the pixels changes sharply. Compared to the image presenting the clean device, the only non-uniform regions found from that image are the edges between different components of the device. Based on this observation a defect localization algorithm was implemented, that utilizes edge detection algorithm proposed by John F. Canny, commonly known as Canny edge detection algorithm [64].

Figure 3.4.Comparison of clean and defected device.

The areas that are known to be non-uniform but not defects are filtered out. The filtering is done by multiplying the image with predefined mask, that suppresses false detections.

Regions that are filtered out are for example the front camera, loudspeaker, home but- ton and the display edge. After the false detections are suppressed, a contour search is applied to the image to get the coordinates for the defect proposals. The defect localization process is illustrated in figure 3.5. Output of the defect localization algorithm are coordinates of the non-uniform regions found from the image.

(43)

Figure 3.5.The process of extracting non-uniform regions from the image.

3.3.2 Defect classification

After the defect localization algorithm has found all the non-uniform regions in the image, the resulting coordinates are used to crop these regions from the raw images.

The cropped regions are first resized to fixed size with resize function implemented for OpenCV. The size of resized image crop is 96 x 96 pixels. The resized crops are then fed to CNN model that classifies the regions into three categories: scratch, dust, or grease.

Examples of the aforementioned classes are presented in the figure 3.6.

Figure 3.6.Detected defects.

For classification of different defects, LeNet-5 based architecture was used. LeNet-5 was first introduced by Yann LeCun in year 1998, to be used in classifying handwritten digits [65]. LeNet-5 based architecture was selected, since it has proved working in detecting defects from different types of surfaces. For example LeNet-5 based models have been

Cosmetic grading of refurbished smartphones : A machine learning approach