Automatic Waste Sorting In Industrial Environments Via Machine Learning Approaches

(1)

SISHIR BHANDARI

AUTOMATIC WASTE SORTING IN INDUSTRIAL ENVIRONMENTS VIA MACHINE LEARNING APPROACHES

Information Technology and Communication Sciences TAU

Master's Thesis

October 2020

(2)

ABSTRACT

Sishir Bhandari: Automatic Waste Sorting in Industrial Environments Via Machine Learning Approaches Master's Thesis

Tampere University

Master's Degree Program in Information Technology October 2020

Speed, safety and efficiency are the key to any industrial progress. We as human beings, get astounded by the industrial achievements and the products manufactured, but we tend to forget about the residue and waste it leaves behind. As the saying goes “ One man's trash is another man's treasure”, we can make use of the waste to generate energy by heat in an incineration plant, recycle it to save natural resources and reduce pollution by effectively recycling inhouse waste therefore decrease in products reaching in landfills. To make the recycling process effective, we have to overcome challenges such as the slow pace of manual sorting, mixing of different materials due to ineffective sorting and labor exposed to harmful materials, which is where automated waste sorting using image-based classification comes into play. The objective of this thesis is to determine and study how we can use different machine learning algorithms, such as convolution neural network (CNN) and support vector machines (SVM) to effectively classify waste generated in the industrial environment into three categories: paper, plastic and metal.

We initiated this thesis work to evaluate if MATLAB with its extensive range of toolboxes can make the image classification task easier, user-friendly and practical. We applied image processing toolbox to prepro- cess the data, computer vision toolbox to implement images detection and so on. Pictures of the waste types were acquired using TrashNet dataset and the Internet. This thesis does not purpose a new classification methodology. It rather aims at designing practical algorithms to work on large-scale data sets to achieve better image classification than the current approaches.

We performed simulation with both CNN and SVM image classifier using three different datasets with 200, 400 and 600 images in each category with image sizes (32x32, 64x64, 128x128), comparing different layer configurations, evaluating other optimizer and kernel functions. As a result, an efficient and accurate model was developed. The bag of features was used to extract robust features in the case of SVM. CNN performed better than SVM, reaching 82.2% accuracy, whereas 79.4% was the highest accuracy achieved by SVM Even though we achieved some good result, there is still room for improvement. Also identifying the components in a hybrid waste (e.g., combinations of paper, plastic, and metal) remains as a topic of future research.

Keywords: Machine Learning, Convolution Neural Network(CNN), Support Vector Machines (SVM), efficiency, trash/waste, Automated, Image classification, Bag of Features(BoF)

The originality of this thesis has been checked using the Turnitin OriginalityCheck service.

(3)

PREFACE

This thesis work was done to meet the graduation criteria of the Tampere University Master’s Degree Program in Information and technology. I studied and wrote this thesis from May to Sept 2020. I want to thank Prof. Elena Simona Lohan for supervising my thesis and Dr. Aleksandr Ometov for the writing inspection of this thesis. I would like to offer sincere gratitude to my supervisors for their undivided attention and guidance throughout my thesis writing process. And at last, I would like to thank Mathworks for fantastic documentation.

Sishir Bhandari

(4)

CONTENT

1 INTRODUCTION ... 6

1.1 Thesis goals ... 8

1.2 Author’s contributions ... 8

1.3 Description of the Thesis Structure ... 8

1.3.1Theoretical Part ... 9

1.3.2Technical Implementation Part ... 9

2 AUTOMATIC WASTE MANAGEMENT IN INDUSTRIAL ENVIRONMENTS ... 10

2.1 Classification and types of waste ... 11

2.1.1 Municipal Solid Waste ... 12

2.1.2 Industrial waste ... 13

2.1.3Electronic waste ... 13

2.1.4Mining waste ... 13

2.1.5Medical Waste ... 13

2.1.6Agriculture waste ... 14

2.1.7Radioactive waste ... 14

2.2 Techniques for waste management ... 14

2.2.1Incineration ... 14

2.2.2Landfills ... 14

2.2.3Composting ... 15

2.2.4Recycling ... 15

2.3 Challenges faced in waste management ... 16

2.4 Use of machine learning ... 17

3 MACHINE LEARNING ALGORITHMS FOR CLASSIFICATION PROBLEMS ... 18

3.1 Introduction ... 18

3.1.1 Supervised Learning ... 19

3.1.2 Unsupervised Machine Learning ... 20

3.1.3 Reinforcement learning ... 21

3.2 Convolution Neural Network (CNN) ... 22

3.2.1 Image processing layer ... 24

3.2.2 Convolution layer ... 24

3.2.3ReLU Layer ... 27

3.2.4 Max pooling layer ... 29

3.2.5Fully Connected layer ... 30

3.2.6 Training options ... 31

(5)

3.2.7 The architecture of the CNN classifier investigated for waste

classification ... 31

3.3 Support Vector Machines (SVM) ... 32

3.3.1 Hyperplane ... 33

3.3.2 Types of SVM kernel ... 37

3.3.3 Multiclass SVM ... 38

3.3.4Bag of features ... 39

3.3.5 The architecture of purposed SVM ... 40

3.3.6Training options ... 40

4 THESIS METHODOLOGY ... 41

4.1 TECHNOLOGIES AND TOOLS ... 41

4.1.1 MATLAB programming Language ... 41

4.1.2 Image Processing Toolbox ... 42

4.1.3 Deep Learning Toolbox ... 42

4.1.4 Computer-vision toolbox ... 42

4.1.5 Statistics and Machine Learning Toolbox ... 43

4.1.6 Simulation Device ... 43

4.2 Image collection ... 43

4.3 Training Strategies ... 44

4.3.1 How does the computer see an image? ... 44

4.3.2 Image Processing ... 45

4.3.3 Image splitting ... 45

4.4 Performance matrix ... 48

4.4.1 Confusion Matrix ... 48

5 CASE STUDY ... 50

5.1 CNN model ... 50

5.2 SVM Model ... 54

6 DISCUSSION AND CONCLUSION ... 57

7 FUTURE WORK ... 58

(6)

LIST OF FIGURES

Figure 1. Classification of waste ... 12

Figure 2. Machine learning approach ... 19

Figure 3. Supervised machine learning ... 20

Figure 4. Unsupervised learning ... 21

Figure 5. Reinforcement learning ... 22

Figure 6. Artificial neural network and Convolution neural network ... 24

Figure 7. Convolution process between 6x6 input matrix and 3x3 kernel ... 25

Figure 8. Examples of some common filters ... 25

Figure 9. 3x3 kernel with a stride of 2 ... 26

Figure 10. Zero padding with different stride value ... 27

Figure 11. Example of ReLU function ... 28

Figure 12. Max pooling of a 4x4 input matrix with 2x2 filter leads to down- sampling ... 29

Figure 13. Flattening of output from max-pooling layer to multiple Fully connected layers. ... 30

Figure 14. The architecture of CNN based on this thesis. ... 32

Figure 15. Hyperplane representation on 2^nd and 3^rd dimension. ... 34

Figure 16. SVM hyperplane with linearly separable data ... 35

Figure 17. SVM hyperplane with generalization error ... 37

Figure 18. Histogram of visual word occurrence after extraction of features ... 39

Figure 19. The architecture of purposed SVM ... 40

Figure 20. Example of training options implementation ... 40

Figure 21. Folder structure and image collected ... 44

Figure 22. Representation of single-channel image in the matrix. ... 45

Figure 23. Importing process of input image datasets and labelling. ... 45

Figure 24. Example of code used to resize, color correct and write the output in the destination folder. ... 46

Figure 25. Data augmentation technique for image processing ... 47

Figure 26. Block representation of partition and size of training and test datasets after splitting. ... 48

Figure 27. Example of a confusion matrix of the CNN model ... 49

Figure 28. The training process and convolution matrix of high performing CNN model 2 ... 52

Figure 29. Example code for input data augmentation ... 53

Figure 30. Evaluating the trained model with validation data ... 54

Figure 31. Confusion matrix of the highest performing SVM classifier ... 56

(7)

LIST OF SYMBOLS AND ABBREVIATIONS

BCE Before the Common Era

AI Artificial Intelligence

ML Machine Learning

IoT Internet Of Things

MSW Municipal Solid Waste

RFID Radio-Frequency Identification CNN Convolution Neural Network SVM Support Vector Machines

EU European Union

US United States

IBM International Business Machines

MRI Magnetic Resonance Imaging

ConvNet Convolution Network ANN Artificial neural networks

2-d Second Dimension

Tanh Tangent Hyperbolic ReLU Rectified Linear Unit

IPCC The Intergovernmental Panel on Climate Change

FC Fully Connected

SGD Stochastic gradient descent

SGDM Stochastic Gradient Descent with momentum ADAM Adaptive moment Estimation

AT&T American Telephone & Telegraph Company

ICLR International Conference on Learning Representations RMSProp Root Mean Square Propagation

RBF Radial Basis Function

BoF Bag Of Features

BoW Bag Of Words

MATLAB Matrix Laboratory

SURF Speeded Up Robust Features SMO Sequential Minimal Optimization CPU Central Processing Unit

GPU Graphics Processing Unit

LSTM Long Short-Term Memory Networks

ARM Advanced RISC Machine

GB Gigabyte

RGB Red Green Blue

RAM Random Access Memory

(8)

1 INTRODUCTION

Gathering and dumping of waste in dumping sites was a common practice in every household in ancient Athens. Self-waste management was a priority. People had to sweep through the streets daily and take the garbage away from the town. Large dumping sites away from settlements were widespread as early as 8000 to 9000 BCE. Mino- ans (3000-1000 BCE), covered the waste regularly with soil layers in large pits [1]. To- day, it has become a common practice to operate automatically many processes which used to be operated manually. Across almost all essential aspects of life, the method of making things automated is being used. Automotive industries, electronics manufacturing, medical, welding, food service, law enforcement and transportation to name a few are the example of industries that are invested in automating and make full use of AI, machine learning and IoT has to offer [2].

In modern times due to improper and inefficient waste management practices, the control of waste is one of the significant issues we encounter daily, this problem needs to be tackled exceptionally well [3]. With proper planning and being less dependent on single- use products, we can dramatically reduce the amount of waste generated daily. Imple- menting effective recycling techniques before they reach disposal sites is a welcoming practice. The effective use of machine learning and AI to handle the garbage and provide better waste management is the particular demand of the day with the production of about 2.1 billion tons of municipal solid waste material annually [4].

Since 2015, there have been at least 5.25 trillion plastic parts in the oceans, which can be harmful to wildlife [5]. The most common solution for many developed countries is to send their waste to developing countries [6]. According to Professor Maiju Lehtiniemi from the Finnish Environment Institute, microplastics are present in all water systems as well as within fish, bivalves, and benthic fauna. In marine organisms, microplastics cause discomfort and may be exposed to harmful chemicals [7]. Solid, gaseous material and semi-solids originating from various industrial, mining, and household environments are identified as one of environmental pollution sources and causes, under the Resource Conservation and Re-capture Act (RCRA).

To tackle the above-mentioned waste related problems, there are acceptable and effective practices in play to collect and recyclable material in the context of municipal solid waste (MSW). MSW is defined as the discarded everyday items after they are used by the public. They are commonly referred to as Garbage and consists of trash items such

(9)

as leftover or scrap food, newspaper, garden trash, bulbs, utensils, bottle, clothing, fur- niture, etc. People are getting educated and more concerned about the wellbeing of the environment. Caverion [8], an underground pipe waste collection system in Finland, transports waste away from residential areas using subterranean vacuum tubes, for in- stance. Technologies using radio-frequency identification (RFID) to measure the level of the waste dumpster to compact garbage collection trucks are mainly focused on smart and effective ways to waste in urban areas [9],[10]. Waste transportation trucks will be notified exactly when it is time to pick up the dumpster, which reduces no of trips. The second step in MSW is the sorting or separation of waste for further processing. Tech- niques such as optical sorting which is used to classify products by using lasers and or cameras, eddy current sorting refers to use of magnets to classify non-ferrous metals, multicompartment bins as the name suggest will have multiple compartments for different trash types, are new methods for handling municipal waste [11]. These technologies are based on optical sensors [12],[13],[14].

Much of the waste generated by most industries are not compostable or reusable, but it is recyclable. The primary task during a waste management process is to classify the products that need to be recycled and to determine the correct location of the recycled centers and recycling bins. Glass, paper, and plastic recycling can be carried out in most recycling centers. Some can also carry out e-waste, metal, paper, cardboard, and various food waste recycling. The disposal of hazardous waste, compostable waste, and toxic solid waste must be segregated. According to the article [15] “In Practice: how au- tomation is revolutionizing waste” in environment journal, almost 50% of the recycling plants in Europe are now automated. However, the efficiency benefits from automation do not come from a complete equipment redesign but rather from the intelligent integra- tion of technology [16].

For manually tracking waste in an industrial environment, typically complicated proce- dures are used, that are burdening and requires a high amount of energy, time and resources from human beings. The root cause of many human problems, e.g. pollutants, infections and adverse reactions to living organism's hygiene is the irregular or poorly done waste management, encompassing home waste, industrial waste and environmental waste. To solve specific issues that can effectively manage wastes without human interference in a healthy environment, we are studying automatic ways, based on machine learning, for the waste classification programme.

(10)

1.1 Thesis goals

The primary purpose of this study has been to establish models for precise waste pre- diction and classification in the industrial environment by comparing two classification algorithms CNN and SVM. We focused mainly on three types of wastepaper, plastic and metal which are abundant in industrial manufacturing and production facilities, also a subset of MSW and are commonly found in the everyday household. This thesis does not suggest a new classification methodology. It aims at designing practical algorithms to work on large-scale data sets in a far more comparable way than the current approaches.

1.2 Author’s contributions

The Author contributions in this thesis are:

 Literature survey on waste management practices, the challenges faced in industrial waste management, and the use of machine learning for waste sorting in industrial environments.

 Implementation of two image classification algorithms, namely CNN and SVM, starting from prior expertise in the unit where the thesis was performed.

 Analysis of performance and efficiency of ML algorithms in different scenarios.

1.3 Description of the Thesis Structure

A brief overview of the thesis chapters are as follows:

 Chapter 2 mainly focuses on types of waste and management techniques, the challenges faced in waste classification and management, and the use of Ma- chine Learning.

 Chapter 3 explains in detail about Machine learning algorithms using the different characteristics and architecture of CNN and SVM for image classification.

 Chapter 4 has elaborate explanations on the methodology used for data acquisi- tion, processing, technologies and software used while training the algorithm

 Chapter 5 is wholly dedicated to the experiment and testing our classification models by optimizing the hyperparameters and analyzing the result.

 Chapter 6 is a summary and conclusion of the thesis.

 Chapter 7 discussion about plans and actions to be taken after thesis.

(11)

1.3.1 Theoretical Part

The theoretical part of the literature review aspect of the thesis includes topics such as:

 Waste Management industry, types of waste, challenges faced in waste management, etc.

 Various machine learning approach in waste management.

 Types of Machine Learning.

 Convolution Neural Network.

 Support Vector Machines.

 The architecture of the purposed algorithm

1.3.2 Technical Implementation Part

The case study of convolution neural networks and support vector machines for image classification of trash objects found in industrial environments is the technical aspect of the thesis. For the experiment, we compared these two ML algorithms by varying various parameters such as the size of the image, amount of input images, properties of the images, variation in layers and different training options. We also studied how they perform and the accuracy they provide to classify trash image in different industrial scenarios. The subsequent chapters describe the challenges faced and solution to the industrial waste management problem.

(12)

2 AUTOMATIC WASTE MANAGEMENT IN IN- DUSTRIAL ENVIRONMENTS

We commonly know that most human activities generate waste. Waste materials are essentially discarded or unusable items. According to European Union [17], waste is any substance that is scrapped or is useless, defective, and/or worthless and it is known as a by-product or product at the end of their of the manufacturing and use process after primary use. Solid waste management is one of the most significant problems of recent global development. Humans produce a lot of waste as a by-product of their life, and they have always produced it. Methane gas, a major greenhouse gas, is emitted from waste deposited in landfills. 91% of all methane emissions of sites were found while studying those landfills [18]. Each job, from the preparation of a meal to the construction of skyscrapers, is followed by the excessive increase in of waste material. Waste has been a problem that has risen several difficulties for humans and all other living beings for thousands of years. According to the projected amount of 1,4 million tons of waste a day in China in recent years, East Asia is seeing the highest growth in the area of waste and will move to South Asia by 2025, then Africa in the future 2050 [19].

Nevertheless, most recently, waste problems have risen exponentially during the industrial and petrochemical revolutions, a rapid rise in world population, and growing con- sumerism. The waste generated in such a faster pace mostly end up in a dispose field or incineration plant because they are not properly sorted or managed [20]. Advances in waste volume and hazard management have promoted a well-deserved dose of techno- logical optimism, while the quantity and dangerous nature of waste continue to challenge the community. The threat of waste affects our public health and ecosystem integrity; it can compromise our aesthetic sensitivity, and it can lead countries to be economically crippled. Waste generation remains a significant problem, as it has always been since ancient times [21].

In [22], Review of Solid Waste Management measured global waste production at 1.3 billion tons per year based on data. Waste production has increased in recent years to levels in line with initial estimates by What a Waste (1999), and the monitoring and re- porting of data have significantly improved with the improvement of underlining technologies. The world production of waste was estimated at 2,01 billion tons in 2016, on the basis of the latest available data [4]. According to Eurostat, in most EU countries, 1-2-

(13)

ton waste is produced per person per year, excluding major mineral waste. Waste generated by per individual in most European countries has decreased by almost half between 2010 and 2016, whereas waste production in other countries has increased dramatically. Estonia has high figures because of its petroleum shale-based energy production [111].

2.1 Classification and types of waste

Some of the specific characteristics used in the waste’s classification include the physical state, technical factors, reusable potentials, biodegradable potential, production source and the level of environmental effects. After considering these characteristics, waste, by material nature, can be commonly divided into three primary types: liquid, solid and gaseous waste [23],[24]. It is clear that in different countries might introduce other classification methods. Table 1 classifies the trash types depending upon the state of the waste, its source and the effect it has on the environment.

Table 1. Types of waste

State of the waste Solid waste such as paper, plastic and metal. Liquid waste like sludge and paint

Gaseous waste: co2, methane Source of the

waste

Commercial waste, Electronic waste, Domestic waste, Indus- trial waste, Agricultural waste, Demolition and construction waste

Mining waste, Medical waste, Sanitation waste Effects on the en

vironment

Harmful waste, Non-harmful waste

In this thesis, we concentrate more on industrial waste and how we can mitigate the complexity of waste management systems and make the sorting management more con- venient. As shown in Fig. 1, both hazardous and non-hazardous waste can be found in industrial and household environments. Most of them are recyclable. To achieve our objective, we need to familiarize ourselves with waste forms and categories. We are concerned with three essential waste materials like such: metal, paper, plastic. The main reason behind focusing on these three trash types is because we make use and discard these products regularly in our daily lives. According to an article in Forbes, 91% out of a million plastic bottle sold daily end up in the garbage. Plastics are harmful to nature and wildlife and takes hundreds of years to decompose. These are abundant in MSW,

(14)

industrial waste, as shown in Fig. 1 and some other waste types which are briefly de- scribed below.

Figure 1. Classification of waste Source: National Audit Office of Estonia

2.1.1 Municipal Solid Waste

Municipal Solid Waste (MSW) is primarily domestic waste, although it contains some associated commercial and industrial waste. Household waste and MSW are commonly used as synonyms. According to Eurostat, in the year 2016 , municipal waste produced by per person in European union was recorded to be 480 kg, which was a significant decrease as compared to 527 kg per person in year 2002 [25]. Solid municipal waste is a non-hazardous disposable material generated by the household, the institutions, the industry, agriculture and wastewater. In MSW, The Intergovernmental Panel on Climate Change (IPCC) covers food, outdoor scraps, wood, clothes, rubber, paper, cardboard, metal, plastics, glass, and others (e.g., electronic wastes). MSW management failure can cause an increase in air, soil, and water pollution around the globe and can become the root cause for natural disasters [26].

Generation rates for MSW vary between cities and seasons and are positively correlated with economic growth and activity levels.

(15)

2.1.2 Industrial waste

Industrial waste is defined as waste created by industrial operations and processes of all sorts, including materials which are considered useless during the production era. Such materials can be paints, metals, dyes, sludge, packaging materials. Industrial waste is one form of waste generated by the manufacturing of plants, factories, mines, agriculture processing plants, and are a significant factor for waste products from the beginning of the industrial revolution. Billions of tons of industrial solid waste are collected on-site in industrial plants annually and is treated onsite; the volume generated is much greater than the amount collected by MSWs [1].

2.1.3 Electronic waste

At this time and age, we as human beings, have a complicated relationship and depend- ence with electronic appliances. Everything from a smart pen to a giant 8k Television screen is made out of various material where most of them are plastic and metal which are reusable. Every device has an expiry date and has to end up at a landfill or a recycling plant at the end of its use. Sadly, only 13.6% of unused machines were recycled compared to 34.5% of MSW in 2012 [112]. This is where our model can come in handy to classify the object according to its material efficiently and push the materials back to the recycling cycle.

2.1.4 Mining waste

Extractive waste, i.e. mineral waste extraction and treatment, is one of the major waste sources in the EU. This requires materials to be removed to access mineral resources, such as topsoil, overloads, and waste rock, and waste left after ore has been primarily extracted. Product packaging and other debris are typical in Mining processes.

2.1.5 Medical Waste

Though medical treatment knowledge has overgrown in this century, and so has the number of patients treated and medical equipment used. Although most of the used medical products such and needles, medicine packaging and single-use medical products are recyclable, they are hazardous and harmful for a human to process for recycling.

Manual sorting of this medical trash can be detrimental to human touch. Automated sorting eliminates the need for human involvement. Medical waste should be divided into two groups: hazardous and non-hazardous.

(16)

2.1.6 Agriculture waste

Most Agriculture waste is comprised of dead crops and their residues and animal ma- nures, but these are not all. Large-scale agriculture requires massive infrastructure to maintain and keep it running. Products such as packaging, vehicles and pesticide containers contribute to the agriculture waste category.

2.1.7 Radioactive waste

Radioactive waste, also known as hazardous waste, comprises flammable, corrosive, poisonous and reactive substances. In short, radioactive waste is a huge and potentially harmful threat to our ecosystem. As discussed earlier, the medical and industrial process also may generate radioactive waste which has to be carefully and effectively managed and processed.

We will discuss more on the management of the recyclable product form Medical, MSW, industrial, Agricultural and other in upcoming Chapter 2.2 and how the purposed machine learning algorithm can come into play in Chapter 2.4.

2.2 Techniques for waste management

The commonly used methods of industrial waste management are:

2.2.1 Incineration

In a specially built combustion chamber, incineration refers to the burning of waste. The idea of burning waste is not new. The first phase of US and European plants were standard refractory burners that were subsequently replaced in the late 19th century with water-wall and modular combustion systems [27]. However, as awareness about hazardous substances is increased, and the quantity of scrap is increased, incinerations now carried out under regulated conditions. Incineration is an effective means of reducing waste volume and waste space demand [28]. In addition, the energy from the waste produced can be used to create power and heat from an incineration plant located near the area of major waste production [27].

2.2.2 Landfills

Landfills are defined as a site for the disposal of unwanted and non-reusable waste products, which is also known as a pit, a landfill, a garbage dump, a waste dump or a dumping

(17)

ground. This approach is often referred to as controlled tipping which works on a smaller scale and is useful in rural areas [29]. It is one of the most common and old method for garbage disposal. Refuse has been left in piles in the past or thrown into a hole; it is known as a midden in archaeology. The EU Member States has deterred this waste disposal method because of a rise in pathogen or toxic chemicals and thus other waste disposal types, such as anaerobic and energy recovery incineration, have been encour- aged [30].

2.2.3 Composting

This waste management process transforms waste into organic compounds for feeding plants. This model is an indeed beneficial technique in terms of environmental ad- vantages. It's easy to turn unhealthy organic products into healthy compost by using this tool. Composting is a method of waste disposal in which organic waste naturally decomposes under conditions that are rich in oxygen. While all waste ultimately decomposes, it can only be deemed compostable for some waste products and put into compost containers. Food waste is excellent compost materials, including banana peels, coffee rolls and eggshells. Composting is a sustainable waste management system [31].

2.2.4 Recycling

Bylinsky (1995 ) states that 94% of the substances being collected from the Planet join the waste stream in months from the American National Academy of Sciences [32].

There’s no doubt that an unmanaged waste is an environmental problem and a poorly managed waste is an ecological disaster. We need to move from consuming resources to properly using them. By consuming it means using the resources and having nothing of value left and by using it means we use that resource and when we are done with it, we reapply to something else that is of use to us. A recent study has found that only nine per cent of the 6.3 billion metric tons of plastic residues generated was recycled [33].

Different approaches have been used for urban waste disposal. 30% of the wastes in the EU were recycled, 27% burned, 25% buried, and 17% composted for 2016. [25]

Waste recycling is crucial to sustaining a healthy environment. Benefits of recycling are as follows. [34]

 Most recyclable materials and products like bottles, papers, electronics can be resourceful and can be traded for money. Even though this trash is discarded as waste in an industrial environment, it still can generate profit by selling it to the next industry.

(18)

 Save energy. Processing and transporting raw materials adds up energy consumption. If we reuse the trash onsite, we can significantly decrease energy consumption.

 Preserves and saves natural resources. After recycling we reuse it to manufac- ture new products which decrease the demand for extraction of raw materials from the Earth.

With the implementation of recycling, we can reduce the air and water pollution, decrease greenhouse gas emission of bromine incineration plants and reduce waste thrown into waste dumps. Recyclage is one of the intelligent solutions for reducing wastes production and its effects on the environment [34]. Our attempt to create an efficient sorting algorithm can contribute to the further development of recycling.

2.3 Challenges faced in waste management

While sorting technologies such as Smart Trash Net, SamurAI, BIN E, ALexNet, which uses various ML techniques to classify trash and is used in the recycling sector. Use of above-mentioned sorting methods helps to minimize disposal costs, streamline waste management, recognize recyclable materials, cutting down on the labor cost. However, these sorting algorithms are not powerful enough to complete the job by themselves and still requires human input time and again. Waste management systems that conduct intelligent sorting have to provide low error rates and provide timely warnings. For the time being the technology, we have at our disposal is powerful enough to do our task but not powerful enough to replace it.

The following are the key hurdles towards efficient waste management:

 Lack of adequate government plan and budget.

 Increase in economic growth has changed the way we consume a product which led to a global rise in waste.

 Complexity in consumption and production system.

 When a product goes through various production cycle in its lifetime, its quality degrades which introduces dilution in the product quality. It affects the recycling process down the line.

 Household awareness.Lack of waste sorting knowledge among people directly interferes with recycling.

 Barriers in waste management technology.

 Management expense: manual sorting and recycling is a tiresome task and requires more workforce [22],[35],[36],[37].

(19)

2.4 Use of machine learning

To overcome some difficulties and challenges mentioned earlier, automation of the waste management process by making the use of machine learning algorithms comes into play.

Even though the machine learning process can be beneficial in some cases, but it is not a solution to all of our garbage related problems. A successful strategy for the waste management challenge would address many solutions, including the development of a formal waste disposal mechanism and the maximization of waste recyclability. The smart waste management system [38], for example, used a method of iterative data-driven learning combined with an algorithm for the training of a model, which showed a correct detection of unloading a recycling container using sensor measurements on the container. Average accuracy of 68% was achieved to sort garbage into three types : disposal, recycling, and paper with Smart Trash Net [39] which uses Region-Based Convo- lution Neural Network (R-CNN) algorithm. Machine learning provides promising results in the areas of smart waste collection and classification produced in urban areas and industrial fields. It is also accurate in predicting future projects of urban solid waste products that are the foundations for progress and further sustainable development and optimizing of existing waste management systems. This next study [40] used pre-trained AlexNet with an accuracy of 87.69%, to detect whether the object in the input image is trash or not. These are a few examples of study making use of machine learning algorithms to tackle garbage management.

The 2018, Waste Expo conference showcased SamurAI, an invention created by Ma- chinex, to use the AI to acknowledge recyclables, for example, cartridges, plastic vessels and containers. SamurAI, a trash sorting robot is capable of doing 70 picks a minute, with up to 95% efficiency, was reported according to the company [41]. Other practical examples like BIN-E [42], an intelligent smart bin and a system by a company TOMRA [43] used for waste stream sorting tasks. They make use of advanced AI and machine learning algorithm to automate the waste management process with improved efficiency and reducing human labor costs.

These machine learning implementations, advancement and progress in managing waste proves that we are moving in the right direction. Now, let us look into the inner workings of the proposed machine learning algorithms for image classification in upcoming chapters.

(20)

3 MACHINE LEARNING ALGORITHMS FOR CLASSIFICATION PROBLEMS

3.1 Introduction

As we know, we as human beings learn and develop our acts from our past experiences and memories, while machines obey human guidance and process codes. But what if machines could learn from the past data and do what people would do with higher pre- cision and efficiency?. According to [44] “machine learning is having a dramatic impact on the way software is designed so that it can keep pace with business change”.

Artificial Intelligence (AI) is defined as a group of various computer science branches that deals with making machines intelligent by training them using appropriate data types. Machine learning is a sector within AI which studies computer algorithms for various scenarios and improves gradually by training itself as humans do. Some ML algorithms use a neural network which is made up of multiple nodes (neurons), weigh on each side, and is a feed-forward network. It makes it possible for the system to map the input to output [45]. According to Arther Samuel, an IBM researcher [46]: “machine learning is the field of study that gives a computer the ability to learn without being explicitly programmed”.

Machine learning uses several algorithms to enhance, process data and predict the results iteratively from the input data. Training data are the initial set of data that fits the parameters of the desired algorithm and acts as the baseline for further improving application library. As we increase the amount of training data to feed in an algorithm, more complex and accurate models based on that output information can then be produced.

The performance generated by training a machine with a learning algorithm with the available training data represents a model for machine learning. One gets an output when one provides an input to a model after the learning process is completed. For example, a predictive algorithm can create a predictive model. One can instead have a projection based on the information with which the algorithm was trained when sending data to the predictive model. For the development of analytical models, machine learning is now a must [44]. The way machine learning works in real life is represented in Fig. 2.

The fundamental question to be solved, such as what factors determine the right data type and what can be done to shorten the data gap is investigated first of all. Then suffi- cient data that fulfils the criteria of the model are collected or configured to train the

(21)

machine learning algorithm. The program will draw on its own experiences to make it easier to evaluate the solution for more data within the training model.

Figure 2. Machine learning approach

Automatic picture labelling or classification is considered to be of importance in computer science and the identification of various ML approaches for automation. The traditional style of manual labelling of images and datasets, however, is expensive and labor-inten- sive [47]. An automatic method to sort images dependent on a controlled technique of learning was proposed in this thesis. Collecting training pictures (labelled pictures) is a vital process in the supervised classification.

We can differentiate machine learning problems into three major categories, which are briefly introduced below.

3.1.1 Supervised Learning

Supervised Machine learning, as the name implies the model is supervised by providing information about the input data. It is mainly used to construct a model based on a thor- ough understanding of the dataset. Let’s assume an example with N labelled training data D={(xn,tn)} n=1… N, where xn represents an input variable, while tn is the corre- sponding label, or response. The Supervised learning aims to predict the value of the label t for an input x which is not a part of training data. In simpler terms, this means that the algorithm seeks to accurately predict the label of any new input x based on the observations made within the training data [48]. This training data has characteristics (labelled Features) that determine the value of the training data. When the labelled data are fed into the proposed ML algorithm, it adapts, observes, and identifies patterns in the provided datasets. The computer learns from the observations made by the algorithm. It

(22)

is a regression problem when the label is continuous; when the data is discrete, it is known as a classification. One has a problem called overfitting if the model can represent only specific patterns in the training subset. If something different than that of the training set is introduced, the model starts to fall apart due to overfitting. Overfitting ensures that your model is correctly tuned to your training data, but it does not apply to broader unknown data sets. Testing against unexpected or uncertain identified data must be performed to protect against overfitting [44].

After we prepare the data, an algorithm must be selected based upon whether it is a regression or classification problem to get the best out of supervised machine training process. Then there are the following steps: fitting a model, choose a validation process, test the model and update until the results are satisfactory. All of these steps should be taken to achieve a better-controlled model of machine learning [44]. A process of a typical supervised learning process is shown in Fig. 3.

Figure 3. Supervised machine learning

3.1.2 Unsupervised Machine Learning

Unsupervised learning is a model learning methodology that does not require user control of the process. Instead, it enables the model to work independently to discover pre- viously uncovered patterns and information. Mainly, unsupervised learning works with unlabeled and mostly unknown data. The data given to an unsupervised model is not marked. This implies that we have input variables without correlating output variables.

Therefore, the algorithms are left with the data to find an existing structure by themselves. An unmonitored machine learning program just gathers a cluster of information from the input data.

(23)

There may be situations in which one has vast quantities of data, for example, social media apps like Facebook, Instagram, Tiktok, and Snapchat where it is almost impossible to find the meaning behind the data manually. Manual labelling the data at this scale is expensive, too time-consuming, and near to impossible [47]. The interpretation of such data requires algorithms which can continue to understand the significance behind the data that it finds based on the patterns or clusters. In simpler terms, clusters can be defined as groups of features. Clustering process adds labels to the data so that it even- tually becomes supervised. There are different use cases of unsupervised learning model such as discovering links between overlapping datasets on the Web [49], infants learning the language [50], a study on accelerated, compressed sensing MRI [51], to name only a few. Fig. 4 visualizes an unsupervised system for machine learning. Here we can see the unknown data are clustered together based upon similar characteristics by using k-means clustering.

Figure 4. Unsupervised learning

3.1.3 Reinforcement learning

Reinforcement learning is a means for an agent to interact with an environment to im- prove the result of the machine learning algorithm. Reinforcement learning refers to the question of making optimal sequential choices based on incentives or penalties earned from previous acts. Reinforced learning (RL) focuses more on the issue of learning than the process of learning. The learning approach is more driven by obstacles and tries to overcome the difficulties they have encountered in the past. Reinforcement learning var- ies from other forms of supervised learning because the experimental data set is not teaching the program. RL effectiveness increases when a learner realizes what action is highly likely to attain rewards without being guided [48].

(24)

Reinforcement learning is also the algorithm that is being used for self-driving cars [52], robotics [53], game playing, or for performance enhancement of a vehicular network [54]

to name a few. Reinforced learning is not guided, because it isn’t provided with proper action to take with the input data, nor is it entirely unsupervised, as feedback on the quality of the selected measure is available. Reinforcing learning is often distinguished from supervised and unsupervised learning by the effect of previous steps on future states and rewards [48]. The following Fig. 5 provides an outline of the reinforcement learning process.

Figure 5. Reinforcement learning

3.2 Convolution Neural Network (CNN)

A neural network referred to as CNN or ConvNet is a subclass of a deep neural network, which is a ubiquitous tool to analyze an image and visual classification problem [55][56].

According to many experts, a network is a deep neural network if it has a hidden layer between the input and output layer. It has to have a certain level of complexity and more than two layers. The use of the term “convolution” suggest that the convolution networks are essentially neural network using convolution equation instead of the multiplication of the general matrix in one or more layers within the system [57]. Although CNN has been found to be most effective and is perhaps the most widely used in various applications for computer vision, it can also be used for other data classification problem as well.

Layers such as convolution layers, normalization layers, RelU layers, pooling layers and fully connected layers are commonly included in the standard CNN algorithm in separate configurations. Different layer forms perform varying roles. The general CNN software pipeline is seen in Fig. 6. Forward propagation stage and backward propagation stage are the two phases of a typical convolution neural network [58]. The fundamental goal of the first stage is to feed the image into multiple layers within CNN with different existing parameters such as weights and biases. Weights regulate signal or the strength of the

(25)

connection, and it determines the properties of the released outcomes whereas, the con- stant biases ensure that the neuron is always triggered even though all the inputs are empty. Consistent biases are always an additional contribution to the next layer with a value of 1. The previous layer does not influence units of bias (they have no entrant connections), but they are related to their own weight. Second, the backward propagation uses the chain rule to determine the gradients of each parameter based on the loss cost. These new parameters are then prepared for the next iteration of the forward phase. The network learning can be stopped after adequate iterations in both the forward and backward stages [58],[59],[60].

Table 2. Pros and cons of using CNN

Advantage Disadvantage

use of less learnable parameters compared to traditional NN

position and orientation of features are not encoded.

Transfer learning makes use of trained network and saves memory and time

lacks the ability to be spatially invariant to the provided data.

More location invariant. Looks at a small portion at a time to train fast.

Relatively slow to train in the absence of GPU.

Table 2 represents the advantage and disadvantage of using CNN. In a conventional Artificial Neural Network (ANN), neuron of the input layer is directly connected to each and every neuron in the hidden layer and so forth however in CNN, only the final dense layer is fully connected as shown in the Fig. 6 below.

(26)

Figure 6. Artificial neural network and Convolution neural network

3.2.1 Image processing layer

The image processing layer is defined as the first layer in Convolution neural architecture. Its job is to specify the image size that is accepted by the network for the training process. CNN layers only accept image with the same dimensions, so we have to make sure that the image in the datasets is processed to have the exact image sizes.

3.2.2 Convolution layer

This convolution layer is the first layer and one of a Convolution Neural Network (CNNs) main building blocks. The word "convolution" means that two functions are logically transformed into a third function. After convolution operation of the input image with the desired filter is carried out, it is then passed to the next layer. The initial layers are used to determine the standard features such as edges and lines, and the layers inline extract mid and high-level features. By implementing various kernels and filters, feature maps are generated after applying convolution operation on the different input image and in- termediate feature maps, as shown in Fig. 7. The convolution layer imitates a neuron's response to visual stimuli. Usually, just the receptive region in every neuron is processed by individual neuron. A 2D feature map is generated using kernels that computes the dot product between the filter and input image by sliding the window over the input image according to given stride value [61],[58]. An output volume is created by combining 2D features maps generated by implementing filters which become the output of the convolution layer.

(27)

Figure 7. Convolution process between 6x6 input matrix and 3x3 kernel For example, as shown in the above Fig. 7, the kernel of a 3x3 scale convolves with the 3x3 size area of the input image resulting in the scalar output on the feature map. The filter then slides again in each iteration, multiplies the filter with remaining sets of 3x3 areas and fills out the feature map from dot multiplication.

Various configurations of filter matrix can be used to extract features such as edge detection, image sharpening, and blur. Fig. 8 shows output images after passing through different filters type.

Figure 8. Examples of some common filters

Features such as a reduction in learnable parameters, similar feature sets between neighboring pixels and not dependent on the orientation and position of the object make convolution neural network efficient, fast processing time and accurate than the other

(28)

image classification algorithm [56]. Hyperparameters such as strides and padding are essential concepts in convolution layer, which help to determine the desired output. They are briefly explained below.

3.2.2.1 Stride

Tweaking hyperparameters of any CNN algorithms, we can achieve a more efficient result by eliminating side-effects introduced by overlapping of the matrix with its neighbors.

One of such parameters is the stride size. Stride size determines the distance a filter needs to cover over the image or video at a time. Stride value of 1 corresponds to a filter which hops 1 pixel at a time over the input map. If the stride value is 2, the filters need to hop two pixels at every step. Smaller maps are created as we increase the stride size.

A larger stride size number corresponds to a more oversized filter with less application and results in an output with a small size compared to the previous feature map [62].

Let’s say we have a 6×6 image, as shown in Fig. 9, and we apply a 3×3 filter with a stride value of one. In each step, the filter operates a matrix operation with the image and outputs a single value. The filter slides 1 step and carries out the same procedure. This process goes on until the filter reaches the end of the input image matrix. The result of this process is a 5×5 matrix. Note that with smaller stride, we encounter with multiple overlaps. However, if we set the stride step to 2, then according to equation 1, the output is 2x2 matrix as shown in Fig. 9. The use of stride will thus significantly reduce the output volume in terms of input and minimize overlap between neighbors [63],[64].

Figure 9. 3x3 kernel with a stride of 2

(29)

3.2.2.2 PADDING

Zero-padding is a way to increase the size of an image, to offset the fact that the use of stride reduces the size of the output from a layer in CNN. Also, in some instances, we may lack the information at the image boundaries because the kernels and the input matrix are of different dimensions. Zero-padding is an efficient way to eliminate the above mention problems and to further manage the dimensionality of output volumes by merely padding the edge of the input, as shown in Fig. 10. To calculate the spatial dimensionality of the convolution layer output, we can use the formula:

𝑜𝑢𝑡𝑝𝑢𝑡 𝑣𝑜𝑙𝑢𝑚𝑒 =(𝑊 – 𝐾) +2𝑃

S + 1, (1)

where 𝑊 represents the input volume, 𝐾 means the receptive filter size, 𝑃 defines the zero-padding set, and 𝑆 refers to stride.

Figure 10. Zero padding with different stride value

3.2.3 ReLU Layer

In a traditional neural network, Sigmoid function and Tanh (Tangent Hyperbolic) function were used to introduce nonlinearity into the equation. Vanishing gradient problem might

(30)

occur as the neural network architecture get more extensive and more in-depth; the gradient signal starts to disappear, which is the big downside. To overcome this obstacle, V. Nair and G. E. Hinton [65] introduced the rectiﬁed linear units (ReLU) deﬁned as follows.

𝑅𝑒𝐿𝑈(𝑥) = max(0, 𝑥),

𝑑

𝑑𝑥𝑅𝑒𝐿𝑈(𝑥) = {1 𝑖𝑓 𝑥> 0}. (2) Visually, the output matrix from the convolution layer passing through ReLU layer looks as in Fig. 11,

Figure 11. Example of ReLU function

ReLu layer is used immediately following a convolution layer or batch normalization layer and is used to saturate or restrict the generated output. ReLU 's primary goal is to boost CNN's non-linearity so that the network can understand more complex patterns. The following function overview is why ReLU has been widely used in recent days. [66]

 Computation Simplicity. ReLU has a more straightforward equation in both function and gradient [67].

 Unlike the Sigmoid and tanh functions [57], Relu function is capable of output- ting a true zero value. This process is called sparse representation.

 The key to this property is to almost entirely avoid the disappearing gradients problem networks trained with that activation function since the gradients are proportional to the activation node [67].

(31)

3.2.4 Max pooling layer

Pooling layers aim to reduce the representative dimension also known as downsampling.

Pooling procedure is also referred to as subsampling. It is used to reduces the number of learnable parameters and size of the images, which lower the complexity during learning but keeps the information and features of input intact. It also is effective to mitigate over-fitting. The most commonly used method of pooling is average pooling and max pooling [68].

A max-pooling layer is usually used to decrease map dimensions and network parameters using right after the convolution layer and ReLU layer. Pooling layers are translation invariant, as their calculations take close pixels into account similar to convolution layers.

[60]. In common practice, these are used in most CNNs as max pools with kernels of 2x2 dimensionality and two strides along the spatial dimensions of the input, and this implies that any feature map will be reduced by a factor of 2 (25% of its original size) by the pooling layer. During max-pooling process, the input matrix gets divides into sub reasons equal to that of kernel size, and during each step, the highest number within the specific region is selected. As we can see in Fig. 12, the output from a convolution layer is 4x4, and when we apply a max-pooling process, it starts from the top left corner. As the name suggests, it outputs the max value from that reason and shifts right according to the allocated stride value, which is 2 in this example. The max-pooling process doesn’t affect the channel; thus, the output matrix results in 2x2 with the same no of channels as the input matrix. The stride of 1 can be used to avoid downsampling, which is rarely used.

Fig. 12 provides an overall representation of the common Max pooling operation [66],[64].

Figure 12. Max pooling of a 4x4 input matrix with 2x2 filter leads to down-sampling

(32)

3.2.5 Fully Connected layer

Fully connected layer sometimes called a dense layer is a feed-forward neural network.

After completion of the final convolution and max-pooling process in the existing model, the 2D feature map volume is feed into a dense layer which then flattens the input and converts it into a 1-dimensional vector output. A fully connected layer is mainly a decision layer which classifies the images by matching the labels with predicted labels. As represented in Fig. 13, all neurons connect to all neuron in the previous layer. Since in fully connected layers, there is a direct connection between each neuron from different layers the parameters required for the computation increases rapidly. In standard CNN algorithms, these layers like fully connected, and softmax layers use more parameters than other layers such as convolution and pooling layers. Out of 60 million parameters of AlexNet classification model, 58 millions of them are from FC layers [69]. Similarly, the majority of parameters (128 million parameters out of 135 million) in VGGNet are from FC layers [70].

Even though the FC layer is useful in image classification, the downside is that these layers have many parameters that contribute to a tremendous computational effort to train them. They are a traditional neural network added at the end of the CNN layers.

Therefore, to eliminate these layers or to reduce connections by some route. A robust and common way of lowering connection between neurons is by the implementation of dropout parameter [60],[71],[72]. Fig. 13 provides a visual representation of fully connected layers.

Figure 13. Flattening of output from max-pooling layer to multiple Fully connected layers.

(33)

3.2.6 Training options

Instead of using an inefficient and redundant process of execution computation on the entire dataset, a Stochastic Gradient Descent (SGD) only calculates on random data selection on a small subset. It reduces redundancy by calculating the cost of only one example for every step. SGD is a type of gradient descent. The technique initially sug- gested in the 1950s can update every model parameter, observe how a change will influence the target function, select a path which would minimize an error rate and continue to iterate to a minimum until the objective function converges [73]. When the learning rate is low, the SGD provides the same efficiency as regular gradient descents [74]. low.

However, in recent years, various new optimizers have been introduced to deal with complicated training situations in which traditional methods of gradient descent are poorly behaving.

Adaptive moment Estimation (Adam) optimizer is most commonly used and is a func- tional optimizer for many machine learning models. Adam [75] is an algorithm developed specifically for training deep neural networks for the optimization of adaptive speeds.

Adam optimizer is an adaptive learning rate approach that calculates individual levels of learning for various parameters. Adam optimizer was published by Diederik P. Kingma of OpenAI and Jimmy Lei Ba of University of Toronto as a conference paper at ICLR 2015 [75]. The authors describe Adam as an algorithm to refine stochastic objective functions based on gradients which utilize the combination of pros of SGD extensions – Root Mean Square Propagation (RMSProp) and the Adaptive Gradient Algorithm makes up Adam optimizer. Which, in turn, helps to integrate and measure individual adaptive learning levels for various parameters [74]. In his 2017 blog post, Tesla AI director Andrej Karpathy reported that Adam is widely used and can be seen in multiple academic arti- cles: “It’s likely higher than 23% because some papers don’t declare the optimization algorithm, and a good chunk of papers might not even be optimizing any neural network at all” [76].

3.2.7 The architecture of the CNN classifier investigated for waste classification

Our desired model is developed with the following descriptions in this thesis. There are 3 neurons in the last fully connected layers according to the number of trash categories listed in Chapter 2. Classification algorithm like AlexNet uses 11x11 filter which increases the number of parameters and training takes weeks; therefore, a filter of dimension 5x5 and 3x3 were used in convolution layers to keep the number of learnable parameters

(34)

relatively low and to minimize computational cost. Batch normalization layer is used after each convolution layer to standardize the extreme non-linearity output at each epoch by calculating the mean and standard deviation of each input variable. We choose to use max pool operation with filter 2x2 and stride step of 2 after each deep neural layer which is a combination of convolution, batch normalization and ReLU layer to reduces the dimensions of the image by half but keeping the number of channels same as the previous layer. At last, we use a fully connected layer to flatten the input layer into 256 neurons which are passed to the second fully connected layer containing 3 neurons after carrying out 70% dropout procedure added to prevent overfitting. Both ADAM and SGDM optimizer was tested with 250 epochs and default learning rate. A visual representation of the architecture of the proposed CNN classifier algorithm is shown in below Fig. 14.

Figure 14. The architecture of CNN based on this thesis.

3.3 Support Vector Machines (SVM)

In this research, we are comparing the efficiency and accuracy of CNN with a different supervised learning algorithm which is known as Support Vector Machines (SVM) also Support Vector Networks [77]. At AT&T Bell Laboratories, Vapnik and Cortes [77] in 1995 developed an SVM algorithm for binary classification. They employed supervised methods for learning with related learning algorithm which assesses data used in challenges of classification and regression. SVM is among the most popular techniques for

(35)

optimizing the training strategy to classify binary data effectively. SVM has proven itself for solving binary classification problems, and it works well compared to other supervised learning methods [78],[79].

In the linear case, SVM aims at finding the decision boundary that significantly increases the margin when all classes are classified correctly. However, data sets are certainly rarely straightforwardly separable, and the classification of a hyper-plane is never achieved by 100 per cent accurately categorized. SVM tackles nonlinear situations by the addition of two concepts: Soft Margin and Kernel Tricks. With the introduction of the Soft Margin, SVM can accommodate a few errors and tries to equalize the deal with choosing a line that enhances the limit and decreases errors. Whereas Kernel trick uses current features, performs specific changes, and produces new features. SVM seeks the non-linear decision boundary for these latest new features boundaries [80]. The expla- nation for selecting more far-reaching support vectors is that it decreases the generali- sation error and consequently contributes to over placement of decision limits with nar- rower margins [81].

The aim is to identify the appropriate separating hyperplane 𝑓 (𝑤, 𝑥) = 𝑤 · 𝑥 + 𝑏 where w represents weight vector, x represent input vector and b is the bias to separate two classes in a specific dataset, with characteristics 𝑥 ∈ 𝑥₁, 𝑥₂, 𝑥₃… 𝑥_𝑛 .Table 3 below provides information on different pros and cons of using SVM model for various machine learning tasks.

Table 3. Pros and cons of using SVM for image classification

Advantage Disadvantage

SVM operates well with unstructured, unknown and semi-structured data such as text, pictures and trees.

Requires Long training time and memory for large datasets [82].

Any challenging problem can be resolved using a suitable kernel function

Choosing an appropriate kernel for the model is stressful.

SVM models implement a generalization error, with less chance of overfitting in SVM

Highly sensitive to noise in the input data [83].

3.3.1 Hyperplane

For an nth dimensional space, a hyperplane can be defined as an (n-1) dimensional subspace. For a space of 2D, the hyperplane would be 1-dimension, and it represents a line whereas, a 2D plane is a hyperplane for a 3 dimension space. [80].

(36)

Let us first look at the 2D scenario. In the case of linear data, it can be separated clearly with a line, as shown in Fig. 15. The function of the line is 𝑦 = 𝑎𝑥 + 𝑏. We rename 𝑥 with 𝑥₁ and y with 𝑥₂, and we get:

𝑎𝑥1− 𝑥2+ 𝑏 = 0, (3)

let us define, 𝑥 = (^𝑥_𝑥¹

2) and 𝑤 = (₋₁^𝑎) , we get:

𝑤. 𝑥 + 𝑏 = 0. (4)

We won't select any hyperplane; we will choose only those who meet the two following constraints:

For each vector 𝑥_𝑖, either:

𝑤 ⋅ 𝑥_𝑖+ 𝑏 ≥ 1 for 𝑥_𝑖 being in class 1, or

𝑤 ⋅ 𝑥_𝑖+ 𝑏 ≤ −1 for 𝑥_𝑖 being in class −1.

The 2D vector equation is obtained using the equation above. It works for a variety of dimensions. Fig. 15 represents the hyperplane in various dimensional problems.

Figure 15. Hyperplane representation on 2^nd and 3^rd dimension.

(37)

3.3.1.1 Hard margin SVM

Figure 16. SVM hyperplane with linearly separable data

Fig. 16 visualizes a simple linear SVM hyperplane. The data points from each class that lies near to the hyperplane are known as Support Vectors [84]. Let us assume that (𝑠, s⁺, s⁻) are the hyperplane that differentiates between correctly classified points. s⁺, s⁻are the support vectors points that are parallel to the hyperplane s and are positioned on the neg- ative and positive side of it.

The hyperplane equation can be expressed as:

s = y_i ( w^Tx_i+ b) = 0, (5) s⁺= y_i( w^Tx_i+ b) = 1, (6) s⁻= y_i ( w^Tx_i+ b) = −1, (7) where the training input dataset is represented as 𝑥_𝑖∈ (𝑥₁, 𝑥₂, 𝑥₃… ) which has the label 𝑦_𝑖 ∈ (−1, +1) , i = 1... n and b represent the bias. The equation for the hyperplane function is defined as follows,

𝑓(𝑥) = w. x_i+ 𝑏 = 0 (8)

Where, 𝑥 is the input vector, weight vector of the hyperplane is 𝑤 = (𝑤1, 𝑤2, 𝑤3 … … 𝑤𝑝), and 𝑏 define the bias [84]. To maximize the margin s, Equation ( 2) needs to resolve the problem of optimization, i.e., which is the distance between the two support vectors parallel to the plane, derived by calculating the difference between the support vector equa- tions.