• Ei tuloksia

Estimating Articular Cartilage Structural, Compositional, And Functional Properties Using Raman Spectroscopy

N/A
N/A
Info
Lataa
Protected

Academic year: 2022

Jaa "Estimating Articular Cartilage Structural, Compositional, And Functional Properties Using Raman Spectroscopy"

Copied!
58
0
0

Kokoteksti

(1)

1

Estimating Articular Cartilage Structural, Compositional, And Functional Properties Using Raman Spectroscopy

Eslam Shehata Master’s Thesis Master’s Degree in Medical Physics Program University of Eastern Finland Department of Applied Physics 21.06.2021

(2)

2

University of Eastern Finland, Faculty of Science and Forestry Master’s Degree in Medical Physics Program

Eslam Shehata, M.Sc.: Estimating Articular Cartilage Structural, Compositional, And Functional Properties Using Raman Spectroscopy.

Master’s Thesis, 58 Pages

Supervisors: Academy Research Fellow, Docent Isaac Afara, Ph.D.

Docent Ervin Nippolainen, Ph.D.

June 2021

Keywords: Articular Cartilage, Idiopathic Osteoarthritis, Post-Traumatic Osteoarthritis, Raman Spectroscopy, Artificial Injury, Partial Least Squares Regression, Support Vector Machines, Random Forest, Machine Learning.

(3)

3

Abstract

The Integrity of Articular Cartilage is paramount for the normal functioning of the knee joint.

As no cure currently exists for OA, it is essential to reliably diagnose cartilage injuries and evaluate their surrounding tissue at an early stage to prevent the initiation and progression of OA, e.g., via surgical intervention. In the present time, surgical treatment of joint pathologies is conducted through arthroscopy, with the success of treatment reliant on an accurate diagnosis of the size and severity of the damage. Nevertheless, conventional arthroscopic assessment of articular cartilage is highly subjective and lacks reproducibility. Vibrational spectroscopy techniques such as near-infrared, mid-infrared, and Raman spectroscopy provide valuable objective information about the cartilage chemical and compositional structure. Raman spectroscopy is gaining rapid popularity in biological applications as Raman peaks are relatively narrow, easy to resolve, have high chemical specificity, and do not suffer from water interference, which is abundant in biological tissues. The main aim of this thesis is to assess the potential of Raman spectroscopy to evaluate the composition, structure, and functional integrity of articular cartilage. Healthy bovine patellae (n=10) were chosen according to visual cartilage surface integrity. Osteochondral plugs (n=72) were prepared and divided into two damage groups and a separate control group. The samples in the damage groups were then artificially degraded enzymatically and mechanically to mimic idiopathic and post-traumatic OA, respectively. The enzymatic damage group was further divided into collagenase 90m, collagenase 24h, and trypsin 30m. The mechanical damage group was divided into abrasion and impact. Following the artificial injury induction, Raman spectra of the samples were measured. Subsequently, reference biomechanical properties, proteoglycan content, and collagen orientation of the samples were measured. Machine learning algorithms (SVM, PLSR, and RF), combined with different spectral preprocessing methods, were then used to develop models for estimating the reference properties from the Raman spectra. Results show that Raman spectroscopy is capable of estimating cartilage biomechanical properties with reasonable error (<=16.5%), with the lowest error observed in the prediction of dynamic modulus (9.5%). From the Raman spectra, collagen orientation was best estimated in the deep zone (8.7%), followed by superficial (11.5%) and middle zones (12.3%). Similarly, the best estimation of proteoglycan content was observed in the deep zone (9.3%), followed by the middle zone (10.8%) and superficial zone (20%). These findings suggest that Raman spectroscopy has the potential to assess cartilage integrity during arthroscopic surgery.

(4)

4

List of Abbreviations

CV Cross-validation DD Digital Densitometry DT Decision Tree

ECM Extracellular Matrix GAGs Glycosaminoglycans

IR Infrared

MIRS Mid Infrared Spectroscopy MRI Magnetic Resonance Imaging MSC Multiplicative Scatter Correction NIR Near Infra-red

NIRS Near Infrared Spectroscopy OA Osteoarthritis

PBS Phosphate-buffered Saline PCA Principal Component Analysis PCR Principal Component Regression PG Proteoglycan

PLM Polarized Light Microscopy PLSR Partial Least Squares Regression PTOA Post-Traumatic Osteoarthritis

RF Random Forest

RMSE Root Mean Square Error

RMSEC Root Mean Squared Error of Calibration RMSECV Root Mean Square Error of Cross-validation RMSEP Root Mean Square Error of Prediction SD Standard Deviation

SG Savitzky-Golay SNR Signal-to-Noise Ratio SNV Standard Normal Variate SVM Support Vector Machines

(5)

5

Table of Contents

Abstract ... 3

List of Abbreviations ... 4

Table of Contents ... 5

Table of Figures ... 7

Table of Tables ... 9

Introduction ... 10

1. Articular Cartilage: Structure, Composition, and Function ... 12

1.1. Cartilage Pathology: Injuries and Degeneration. ... 13

1.2. Current Diagnostic Approaches of Cartilage ... 13

2. Raman Spectroscopy ... 15

2.1. Fundamentals of Raman Spectroscopy in Biological Tissues ... 16

2.2. Raman Spectroscopic Characterization of Articular Cartilage. ... 17

3. Data Analysis ... 18

3.1. Spectral Preprocessing ... 18

3.2. Multivariate Analysis ... 19

3.2.1. Support Vector Machines ... 20

3.2.2. Random Forests ... 21

3.2.3. Partial Least Squares Regression ... 23

3.2.4. Model Training and Validation ... 24

3.2.5. Feature Importance ... 25

4. Methodology / Materials and Method... 26

4.1. Sample Preparation ... 26

4.1.1. Mechanical Damage ... 26

4.1.2. Enzymatic Degradation ... 27

(6)

6

4.2. Spectroscopic Measurements. ... 28

4.3. Reference Measurements ... 28

4.3.1. Biomechanical ... 28

4.3.2. Collagen Orientation ... 29

4.3.3. Proteoglycan Content ... 30

4.4. Pre-processing and multivariate analysis ... 30

4.4.1. Pre-processing... 30

4.4.2. Multivariate analysis ... 31

5. Results ... 33

6. Discussion ... 44

7. Conclusion ... 47

8. Further Work ... 47

9. References ... 48

10. Appendix ... 57

10.1. Appendix 1: Validation Metrics ... 57

(7)

7

Table of Figures

Figure 1: Different Zones in Articular Cartilage ... 12

Figure 2: Energy-level diagram showing the states involved in Raman spectra. ... 15

Figure 3: High amplitude- Low variation Peak (Blue), Low amplitude- High variation Peak (Orange) ... 16

Figure 4: Amide vibrations ... 16

Figure 5: Aromatic amino acids: A) Proline, B) Hydroxyproline, C) Phenylalanine, D) Tyrosine, E) Tryptophan ... 17

Figure 6: Raman spectrum of cartilage ... 17

Figure 7: Decision Tree Hierarchy ... 22

Figure 8: Random Forest ... 22

Figure 9: 5-Fold Cross – Validation ... 25

Figure 10: Anatomical location of bovine osteochondral samples: CN-control, COL24- collagenase 24 h, IMP-impact, COL90-collagenase 90 min, ABR-abrasion, and TRYP30- trypsin 30 min. ... 26

Figure 11: Custom-made drop-tower used to induce impact injury. ... 27

Figure 12: Custom-made grinding tool used for abrasing the sample. ... 27

Figure 13: Calculation of collagen orientation profile in MATLAB® ... 29

Figure 14:Stained Histological Sections. (A)Control Group, (B)Collagenase 24h, (C)Impact, (D)Collagenase 90min, (E)Abrasion, (F)Trypsin 30min ... 30

Figure 15: Raman Fingerprint Region ... 31

Figure 16:Impact mean spectrum vs Control mean ... 34

Figure 17:Abrasion mean spectrum vs Control mean ... 34

Figure 18:Collagenase 90-minute mean spectrum vs Control mean ... 35

Figure 19:Trypsin 30-minute mean spectrum vs Control mean ... 35

Figure 20:Collagenase 24-hours mean spectrum vs Control mean ... 36

Figure 21:Mean of Mechanical damage vs Mean of Enzymatic Damage vs Control group mean ... 36

Figure 22:Target variables box-plot distributions. A) Dynamic and Instantaneous Moduli, B) Equilibrium Modulus, and C) Cartilage thickness, D) Proteoglycan Content, and E) Collagen Orientation. ... 40

Figure 23: Scatter Plot of A) Equilibrium Modulus, B) Instantaneous Modulus, C) Cartilage Thickness, D) Dynamic Modulus. ... 42

(8)

8

Figure 24: Scatter Plot of Proteoglycan Content across A) Superficial, B) Middle, and C) Deep Zones and Collagen Orientation across D) Superficial, E) Middle, and F) Deep Zones ... 43 Figure 25:Feature Importance for A) Proteoglycan Content, B) Collagen Orientation, C) Dynamic Modulus and D) Cartilage Thickness ... 43

(9)

9

Table of Tables

Table 1: Most Important Hyperparameters of SVMs ... 21

Table 2: Most Important Hyperparameters of RF ... 23

Table 3:Pre-Processing Parameters... 31

Table 4:RF-Hyperparameters search space ... 32

Table 5:SVM-Hyperparameters search space ... 32

Table 6:Raman peak assignments ... 37

Table 7: Average reference values of each damage group ... 39

Table 8: Optimal models hyperparameters and pre-processing ... 41

Table 9: Metrics of the optimal models that maximise the relationship between Raman spectra and cartilage reference properties ... 42

(10)

10

Introduction

Articular cartilage is a soft musculoskeletal tissue that covers the ends of long bones in synovial joints1. The primary function of articular cartilage is to dissipate and distribute contact stresses resulting from joint loading and ensure smooth articulation in diarthrodial joints.2,3 One of the main diseases that affect the integrity of articular cartilage is osteoarthritis (OA). The worldwide prevalence of OA and trauma-induced post-traumatic OA (PTOA) is estimated to be up to 28% in adults above 60 years old4. According to a study done in 2017, more than 303 million people worldwide suffer from OA, with 263 million of them having knee OA specifically5. OA and PTOA are major public health issues in Finland, which has one of the highest rates of joint replacement surgeries in Europe. The Finnish Health 2000 Survey estimated that 10% of Finns are clinically diagnosed with hip OA6.

Although the pathogenesis of OA and PTOA is unclear, it is often initiated from injury of articular cartilage, subchondral bone, or meniscus7–9. OA is characterized by severe pain, restricted joint movement, and erosion of articular cartilage. OA is often associated with progressive degeneration of articular cartilage, accompanied by sclerosis (abnormal growth) of the underlying subchondral bone7,10,11. An increase in water content7, decrease in proteoglycan content in the superficial cartilage7, and degradation of the collagen network12 are also known to occur at the early stages of OA. These changes result in compromised functionality of cartilage, leading to further damage13, with degeneration of the collagen network, which is especially harmful as it is an irreversible process14.

As OA is incurable at present, it is vital to accurately diagnose cartilage injuries and assess their neighboring tissue at an early stage to avoid the initiation and progression of OA, e.g., via surgical intervention. At this time, surgical treatment of joint pathologies is conducted through arthroscopy, with the success of treatment reliant on an accurate diagnosis of the injury’s size and severity. Nevertheless, Conventional arthroscopic assessment of articular cartilage is subjective and lacks reproducibility15,16.

Several techniques are available to evaluate articular cartilage pathology. These can be categorized as imaging techniques: like magnetic resonance imaging (MRI)17 and X-Ray imaging18, acoustic methods: such as ultrasonography19, high frequency20, and intra- articular21,22 ultrasound, optical imaging methods: optical coherence tomography (OCT)23,24,

(11)

11

and vibrational spectroscopy: such as near-infrared25–34, mid-infrared35,36 and Raman37–46 spectroscopy. Each of these modalities excels at measuring certain diagnostic information. A detailed description is provided in section 1.2.

The main aim of this thesis is to assess the potential of Raman spectroscopy to evaluate the composition, structure, and functional integrity of articular cartilage based on the hypothesis that:

Raman spectroscopy, coupled with machine learning, can estimate the structural, compositional, and functional integrity of articular cartilage in

both health and disease.

To test this hypothesis, the following objectives will be undertaken in this thesis:

1. Obtain Raman spectroscopic data from articular cartilage samples with varying degrees of tissue degeneration. In this thesis, existing bovine articular cartilage samples subjected to different types of artificially induced damage, mimicking physiological tissue damage, will be used. The reference properties of these tissues, including structural, compositional, and functional properties, will also be used in this thesis.

2. Develop machine learning models based on (Random Forests (RF), Support Vector Machine (SVM) & Partial Least Squares Regression (PLSR)) for estimation of the biomechanical (equilibrium, dynamic, and instantaneous moduli, and thickness), structural (collagen fibril orientation) and compositional (depth-wise proteoglycan content) properties of articular cartilage.

(12)

12

1. Articular Cartilage: Structure, Composition, and Function

Articular cartilage has been proposed to be a biphasic material47,48 with an anisotropic and nonlinear mechanical behavior. The fluid phase, which contributes to 68-85 % of the total tissue weight, is comprised of interstitial water and mobile ions. This phase is responsible for the dynamic and time-dependent mechanical properties of cartilage. On the other hand, collagen fibrils and negatively charged proteoglycan macromolecules are the main components of the solid phase of cartilage, along with a small proportion of chondrocytes. While collagen is responsible for the tensile and dynamic compressive stiffness of articular cartilage along with its shear modulus, proteoglycans have been suggested to be the main contributor to cartilage equilibrium modulus in compressive loading 49.

Figure 1: Different Zones in Articular Cartilage

Articular cartilage structure can be divided into four zones according to the collagen fiber orientation through its depth50—namely superficial, middle, deep, and calcified zones. In the superficial zone, the collagen fibers are relatively thin and parallel to each other and the

(13)

13

cartilage surface. In the middle zone, the fibrils thicken as they transition to the deep zone resulting in a random orientation. In the deep zone, the fibrils are thickest and are perpendicular to the cartilage surface. While cartilage proteoglycan content increases with tissue depth, the reverse is the case with water content and cell density. The calcified layer is a thin layer below the deep zone that attaches the cartilage tissue to the subchondral bone.

1.1. Cartilage Pathology: Injuries and Degeneration.

Articular cartilage role in distributing and dissipating contact stresses in the joint exposes it to harsh biomechanical demands. Moreover, the lack of blood vessels and nerves and the dependence only on synovial fluids for nutrition leads to a limited healing capacity in the event of disease or injury. Thus, alteration of cartilage integrity during disease can critically impact joint stability.

Microscopic changes in cartilage composition and structure during disease include decreased collagen and proteoglycan contents with an associated increase in the cartilage water content.

These changes cause a decline in the tissue’s biomechanical properties, leading to tissue softening51. Softened articular cartilage has a lower capability to function under normal physiological loading conditions, giving rise to surface fibrillation and fissures. Continued loading of this damaged cartilage negatively influences its matrix resulting in disease progression. This leads to further destruction and eventual erosion of the cartilage matrix culminating in advanced osteoarthritis.

Osteoarthritis (OA) is an example of a non-inflammatory joint disease. OA can further be categorized into idiopathic (primary) and post-traumatic osteoarthritis (PTOA), which is sometimes referred to as secondary OA. PTOA may result from abnormal mechanical loading of the articulating joints, which causes high peak stresses on the cartilage. These traumatic situations usually arise in sports or accidents. On the other hand, idiopathic OA may develop due to ageing, obesity, blood clotting, genetic disorder, or muscle weakness.

1.2. Current Diagnostic Approaches of Cartilage

Several invasive and non-invasive diagnostic modalities are currently used in the assessment of cartilage health. One of the non-invasive modalities is magnetic resonance imaging (MRI).

MRI has been used extensively to evaluate the morphology and composition of osteoarthritic tissue52. However, current clinical MRI is relatively expensive, and the spatial resolution is not

(14)

14

high enough for identifying incipient fibrillation of cartilage that is typical in the very early stages of OA. X-ray imaging is another widely used imaging modality. While X-ray cannot visualize soft tissues like cartilage, it can indirectly indicate the erosion of cartilage through joint space narrowing resulting from erosion of articular cartilage, which is typical of the advanced stages of cartilage degeneration53. Clinical examination with the aforementioned modalities is followed, in the case of disease, with repair surgery by arthroscopic intervention.

Arthroscopy offers a thorough examination of lesion size and severity; however, the modality is ineffective in detecting early degenerative changes in cartilage. Furthermore, studies have shown that arthroscopy lacks reproducibility15,16, owing to its subjective nature. Thus, suitable diagnostic modalities capable of discovering the beginning and progression of cartilage deterioration, both objectively and in real-time, are needed54.

Multiple studies have demonstrated the ability of Raman spectroscopy to assess bio-chemical changes in musculoskeletal tissues accompanying injury/disease45. Thus, exploiting Raman spectroscopy, using fiber-optic instrumentation39, during arthroscopy could serve to provide the real-time, quantitative, and objective assessment of cartilage condition required to circumvent the limitations of traditional arthroscopic assessment of cartilage integrity in the joint.

(15)

15

2. Raman Spectroscopy

V.C. Raman was awarded the Nobel prize two years after discovering the Raman effect in 192855. If a specimen is illuminated by a monochromatic light source, and the scattered light is recorded, the collected spectrum will contain a very strong exciting line at the same frequency as the source. This is due to the elastic scattering of the specimen's molecules to the incident photons. More importantly, there will also be other weaker signals on either end of the line, due to the inelastic scattering of the incident photons (Raman scattering)56. These lines will be very weak relative to the excitation line as the probability of Raman scattering is 1 to 107.

Raman effect takes place when a photon is scattered inelastically from the electric dipole of a molecule. Scattering is explained by quantum mechanics as an excitation to a virtual energy state, lower than a true electronic transition.

Followed by its prompt de-excitation (10-14 seconds or less) with a change in vibrational energy57–59.

Stokes and anti-Stokes scattering modes energy difference originates from the difference between

the starting and final state of the excited molecules60 as shown in Figure 2. In Stokes scattering, the molecule is initially in a ground state and ends up in an excited state. With the increase in the molecule's energy compensated from the scattered photon's energy. In this case, the scattered photon has lower energy than the incident one. On the other hand, the situation is reversed in anti-Stokes scattering. The molecule is initially in an excited state and ends up in the ground state with the excess energy being added to the incident photon, resulting in a higher energy scattered photon.61 At room temperature, the number of excited molecules is significantly lower than those in the ground state. Thus, the anti-Stokes-shifted spectrum is always weaker than the Stokes-shifted spectrum62. Both spectra have the same frequency information.

A change in vibrational, rotational, or electronic energy of a molecule is accompanied by Raman scattering. The resulting vibrational spectrum consists of bands representing the

Figure 2: Energy-level diagram showing the states involved in Raman spectra.

(16)

16

molecule's vibrations. This spectrum depends on the masses of the atoms in the molecule, the strength of their chemical bonds, and their atomic arrangement63. Consequently, different molecules have different vibrational spectra or different “fingerprints”.

Consequently, molecular species can be resolved from their Raman spectrum by analyzing the peak positions.

Further information about the molecules' relative concentration can be derived from relative peak amplitudes64. Conformational information on the other hand can be extracted from peak width, where wider peaks mean a larger variation in conformation between the same molecular species in the sample. Figure 3 shows an example of a high amplitude-low variation peak against a low amplitude–high variation peak.

2.1. Fundamentals of Raman Spectroscopy in Biological Tissues

The application of Raman spectroscopy in biology is rapidly increasing as it provides molecular-level information, allowing investigation of functional groups, bonding types, and molecular conformations. In addition to being a relatively simple, reproducible, non- destructive technique, it requires a small sample size with minimal sample preparation. It also does not suffer from water interference, which is abundant in biological tissues. Raman peaks are relatively narrow, easy to resolve, and sensitive to molecular structure, conformation, and environment, resulting in a high chemical specificity modality65.

Various studies66–71 have mapped Raman peaks in biological tissues to their underlying structures whether it is a particular chemical bond or a functional group. Perhaps the most important for Raman analysis of proteins are the amide bands.

The amide bands originate from the vibrational normal modes of the protein backbone. There are nine modes of vibration labeled A, B, and I-VII in order of decreasing frequency. In the fingerprint range (750 – 1800 cm-1) only amide I, II, and III - shown in Figure 4 - are present,

Figure 4: Amide vibrations Figure 3: High amplitude- Low

variation Peak (Blue), Low amplitude- High variation Peak

(Orange)

(17)

17

of them, amide II is very weak and could only be observed in the absence of resonance excitation. The specific positions of these bands are indicative of the structure/conformation of the proteins at hand. Other important spectral features in protein spectra include aromatic amino acids like proline, hydroxyproline, phenylalanine, tyrosine, and tryptophan shown in Figure 5.

Raman bands from these amino acids provide further information about the environment of the proteins.

2.2. Raman Spectroscopic Characterization of Articular Cartilage.

Figure 6 shows a typical Raman spectrum of bovine articular cartilage. Various studies have made use of the information this spectrum contains to characterize articular cartilage. In 201139, Esmonde-White et al carried out a proof-of-concept study for adapting Raman spectroscopy in arthroscopic

measurements. The study used a custom-designed Raman fiber optic probe to examine the knees of human cadavers and tissue phantoms. In a following study in 201972, they utilized Raman spectroscopy to probe the biochemical composition of the synovial fluid of 40 patients who suffer from OA. The information gained from the spectra allowed joint damage assessment, to determine if the patient is osteoarthritic or not. Another study by Shaikh et al.41 was successful in discriminating between types of cartilage injuries using Raman spectroscopy.

Figure 5: Aromatic amino acids: A) Proline, B) Hydroxyproline, C) Phenylalanine, D)

Tyrosine, E) Tryptophan

Figure 6: Raman spectrum of cartilage

(18)

18

3. Data Analysis

While early endeavors in Raman spectra analysis relied on expert knowledge of spectral characteristics, recent approaches have utilized a broad array of statistical and machine learning algorithms. A paper by Sevetlidis et.al demonstrated the superiority of a machine learning algorithm called “Extremely Randomized Trees” to current state-of-the-art methods for spectral identification and analysis.73 Another paper by Zheng et.al showed the capability of a principal component analysis – support vector machine model to differentiate between cervical squamous cell carcinoma and cervical adenocarcinoma from their Raman spectra.74 However, to apply analysis algorithms on spectra, they must be pre-processed. Raman data pre-processing is done using numerical techniques to remove the background, baseline, and noise before normalization75. The significance of pre-processing should not be underestimated, as faulty pre-processing may cause incorrect conclusions.76.

3.1. Spectral Preprocessing

As previously mentioned, Raman spectroscopy is based on the inelastic scattering of photons from a sample illuminated by a laser source to acquire information about its molecular structure. One of the benefits of using Raman spectroscopy is that it does not get affected prominently by physical variations -like shape, size, or density of the sample, compared to NIR measurements77. However, other issues such as sample fluorescence, detector noise, and laser fluctuations can alter Raman signal integrity.

Raman fluorescence originates from the de-excitation of low-lying excited electronic states.

This effect is prominent when the laser’s frequency corresponds to an electronically excited state’s energy. Fluorescence can overshadow the Raman signal, as it tends to be much stronger.

Fluorescence affects the Raman signal by creating a broad band convoluted with the Raman signal78. Raman systems with longer wavelength lasers (>785 nm) are often employed to avoid undesirable fluorescence, although it may not be prevented entirely in many cases. Baseline correction methods or derivatives are usually applied to correct for fluorescence numerically79. Another factor influencing the Raman spectra is laser fluctuations, notably when conducting lengthy measurements. Laser fluctuations are often reflected across the spectrum as an overall intensity shift. These fluctuations can be easily adjusted, provided that the shift does not decrease the spectrum’s signal-to-noise ratio.

(19)

19

In addition to laser fluctuations, other variability sources include the detector’s thermal noise and ambient light. These types of errors commonly follow a Poisson distribution. Acquiring multiple accumulations of the signal for averaging reduces this error in proportion to the square root of the number of measurements.

To eliminate the aforementioned artifacts, noise, and variability unrelated to the features sought in the spectra we implement spectral pre-processing methods80. Pre-processing techniques can be grouped into two broad classes: scatter correction and spectral derivatives80. Some standard scatter correction methods include standard normal variate normalization (SNV), multiplicative scatter correction (MSC), and simple normalization79,81.

On the other hand, an example of spectral derivative pre-processing techniques is Savitzky- Golay filtering (SG)82. Similar to scatter correction methods, spectral derivative techniques aim to remove the additive and multiplicative artifacts from the spectrum80. This is done by eliminating the spectrum’s baseline and linear trend by the first and second derivatives, respectively80. Differentiation has the benefit of intensifying spectral differences, but this comes at the cost of decreasing the signal-to-noise ratio and thus requires smoothing. SG filter smoothens the spectrum by fitting successive windows of certain width with a low-degree polynomial using linear least squares83. This technique is also used to remove the baseline.

The simplest and most effective pre-processing method is truncation. By truncating the spectrum to the fingerprint region of the chemical species under study, variability related to other species in the sample can be avoided. Additionally, SNV and filtering perform better when applied to shorter spectral regions. Generally, pre-processing steps should be minimized to avoid loss of valuable information from the spectrum and maintain spectral fidelity.

3.2. Multivariate Analysis

In order to estimate the target variables multiple machine learning algorithms can be used. In this section, the basic idea of the machine learning algorithms used in this thesis, along with their hyperparameters that need to be fine-tuned to get the best predictive capability of the models, will be introduced.

(20)

20

3.2.1. Support Vector Machines

In 1992 Vapnik et al. introduced support vector machines (SVMs) as a classification algorithm84, five years later Druker et al. extended the algorithm’s capabilities to regressive tasks.85 The aim of SVMs is to find the optimal hyperplane that would divide the hyperspace between different classes, where optimal means having the maximum distance possible between the hyperplane and the training data points closest to it, which are termed support vectors.

SVM defines a cost parameter C, which defines a penalty for misclassifying support vectors in non-separable linear cases. High values of C generate complex decision boundaries to misclassify as few support vectors as possible86.

For cases with classes that are not linearly separable, SVM utilizes an implicit transformation of input variables using a kernel function (ex. Polynomial, radial basis function, sigmoid, etc..), which allows SVM to separate nonlinearly separable support vectors using a linear hyperplane87.

Table 1 shows a list of the most important hyperparameters of the Support vector machines regression algorithm, the list is not meant to be in-depth or exhaustive, for more information please refer to the sci-kit learn documentation of the function “SVR” .88,89

SVMs are effective when used in high dimensional spaces, especially if the number of features is more than the number of observations. However, its effectiveness is greatly reduced if the opposite happens. Other limitations include sensitivity to noise in observations and not being suitable for use with large datasets. Other than that, SVMs are considered to be a memory- efficient algorithm.

(21)

21

Table 1: Most Important Hyperparameters of SVMs

Parameter Range Explanation

Kernel

Linear

Kernel type to be used in the algorithm.

Polynomial Radial basis function

Sigmoid

Degree Integer value Degree of the polynomial kernel function, if used.

Gamma

1 / number of features

Kernel coefficient, except for the linear case.

1 / (number of features * Variance of X)

C Float Regularization parameter.

3.2.2. Random Forests

The first work on Random Forests goes back to Tin Kam Ho back in 1995 who introduced random decision forests90. Leo Breiman further developed the idea and introduced bagging and random selection of features together known as bootstrapping91 to the algorithm92 and later along with Adele Cutler registered the name Random Forests as a trademark93.

Random forest is an ensemble learning algorithm, of the bagging family. The main theory of random forest’ is that the aggregation of multiple uncorrelated base estimators (which suffer from high variance) will result in a low variance low bias estimator. The base estimator used is the decision tree algorithm hence the name Random Forest.

(22)

22

A Decision tree (DT) is a cascade of conditions starting from a root node, for each node (condition) the right child node represents the

“True” value, while the left represents the

“False”. The purpose of the decision tree is to route the inquiry (prediction) through its nodes to the last level (leaf) where a classification or a prediction verdict will be present as shown in Figure 7

Decision trees boast a plethora of advantages, including ease of interpretability, requiring

little effort for data pre-processing, the ability to work with both numerical and categorical data along the ability to predict nonlinear relations.

However, DTs suffer from overfitting when the trees grow too deep, high variance as small variations in training data can result in a totally different tree being generated, and lastly, the tree training process itself does not guarantee to reach the global minimum/maximum required in the search process itself.

Luckily, the Random Forests algorithm was designed to mitigate the aforementioned limitations of decision trees. The notion is the mean of multiple uncorrelated DTs will always outperform the best single tree.

The algorithm ensures that the different

DTs are uncorrelated by training each tree on a randomly selected subset of the original features which is known as bootstrapping. The final prediction is calculated by averaging the predictions of each individual DT in the forest which is known as aggregation. Together bootstrapping and aggregation are called bagging.

While random forests address the drawbacks of DTs, it suffers from a few disadvantages of its own, the biggest of which is being too slow in training and prediction in the case of using a large number of trees.

Figure 7: Decision Tree Hierarchy

Figure 8: Random Forest

(23)

23

Table 2 presents a list of the most important hyperparameters of the Random Forests algorithm, the list is not meant to be in-depth or exhaustive, for more information please refer to the sci- kit learn documentation of the function “RandomForestRegressor”89,94.

Table 2: Most Important Hyperparameters of RF

Parameter Explanation

Number of Estimators The number of trees in the forest.

Maximum Depth The maximum depth of the tree.

Maximum Number of

Features The number of features selected at each node Minimum Samples for

Split

The minimum number of samples required to split an internal node

Minimum Samples in A Leaf

The minimum number of samples required to be at a leaf node.

3.2.3. Partial Least Squares Regression

The development of Partial Least Squares is attributed to Herman Wold who developed it in the late 1960’s95. Partial least squares regression (PLSR) is a supervised learning algorithm, which is based - in great part, on Principal component analysis (PCA). In PCA the input features’ principal variance direction(s) are used to project the data points in a hyperspace where they can be classified or regressed. Principal component regression models utilize these principal variance directions to predict test data. It should be noted that in PCA and PCR, only the covariance of the training input features matrix (i.e., XTX) is used. On the other hand, PLSR utilizes principal variance directions of the input features along with their target values. In other words, PLSR estimates the target variables from the covariance structure of the input features and target matrices (i.e., XTY).

Therefore, PLSR can be described as a supervised learning technique, while PCR is an unsupervised one. The latter predicts using the input features only, while the former utilizes both the input features and the target data for predictions.96

PLSR has only one hyperparameter to configure which is the number of principal components used for prediction. The greater the number of components used; the less data is lost. On the other hand, if too many components are used, noise in the data will be passed to the model.

(24)

24

PLSR has many benefits, including the ability to robustly handle multicollinearity in input features, offering high predictive accuracy ,and a very low probability of chance correlation.

The key shortcomings are a higher risk of ignoring ‘real’ correlations and being sensitive to the relative scaling of the input features97.

3.2.4. Model Training and Validation

Using machine learning algorithms, we can develop models capable of producing data-driven predictions. However, to make accurate predictions for the target parameters in the real world, these mathematical models have to be fitted to some collected data. This stage in model development is called model training, and the dataset used for fitting is named the training dataset. Nevertheless, we need to evaluate how well the model will perform on unseen data to assess its prediction power. Here, we use another dataset called the test dataset, which is just a part of our collected data that was not involved in fitting the model.

In most cases, the model’s hyperparameters need to be fine-tuned to enhance its prediction capability. In this case, we will need to try different hyperparameters, evaluate the model’s performance using the test dataset and then compare which hyperparameter-set resulted in the best metrics. The problem with this approach is that the test dataset is now used to influence the model, resulting in the metrics being unrepresentative/ untrue.

To solve that problem, we divide our training dataset into K folds, use (K-1) folds for training, and the last fold for testing - although, in this context, it will be named validation as it is an intermediate step in developing the model. To ensure the folds will not influence the model’s metrics - and only the model hyperparameters do - the training-validation step repeated on all folds, each time leaving a different fold out. A schematic showing the folds’ assignments for K = 5 is present in Figure 9.

(25)

25

3.2.5. Feature Importance

Feature importance scores are considered insightful scores apart from the model prediction capabilities. Calculating the developed models’ feature importance score has mainly three advantages. First, they provide insights into the data correlations between each of the input features and the target observations. Second, studying the aforementioned correlations provides a deeper interpretation regarding the significant physical/biological relations between the features and the targets. And finally, using the two advantages mentioned earlier can result in providing better feature selection techniques. This can improve the developed models’

complexity, prediction, and generalization capabilities.

Feature importance for SVM models are calculated from coefficients of the support vectors in the decision function and the support vectors themselves, for PLSR models the feature importance are the coefficients of a linear model that maps the spectra to the target variables.

Lastly, for Random Forests, feature importance is computed from the mean decrease in impurity across all forest trees.

Figure 9: 5-Fold Cross – Validation

(26)

26

4. Methodology / Materials and Method

4.1. Sample Preparation

Fresh bovine (age 14-22 months) knee joints were obtained from a local abattoir. Ten healthy patellae were chosen according to visual cartilage surface integrity. Each patella was divided vertically into two halves, medial and lateral, only the healthy halves were used in the study.

Artificial injures were induced on the healthy cartilages in order to mimic common injures.

Mechanical injures in the form of impact and abrasion damage were used to simulate PTOA, while enzymatic damage with trypsin and collagenase was used to mimic idiopathic OA.

Osteochondral specimens (n = 72) were prepared and divided into two groups according to the type of damage, in addition to a control group for reference measurements.

Control specimens (n =12), mechanical damage group (n = 24), and enzymatic damage group (n = 36). Control and mechanical damage group samples were obtained with a biopsy punch (diameter = 7 mm) resulting in a cylindrical osteochondral plug geometry. To avoid lateral penetration of the enzyme into the specimen with that geometry, larger rectangular specimens (10x15 mm2) were used in the enzymatic treatment step, followed by trimming the specimens to the former mentioned cylindrical geometry, also utilizing the biopsy punch.

4.1.1. Mechanical Damage

Two mechanical injuries were employed in the mechanical

damage group (n = 24). The first sub-group (IMP, n = 12) was subjected to impact loading injury, while the second sub-group (ABR, n = 12) was the subject of surface abrasion damages.

Figure 10: Anatomical location of bovine osteochondral samples: CN-control, COL24- collagenase 24 h, IMP-impact,

COL90-collagenase 90 min, ABR-abrasion, and TRYP30-

trypsin 30 min.

(27)

27 The impact loading injury was induced by a custom- made drop tower Figure 11 as described in Korhonen et al.98 A polished steel ball (d = 1cm) carrying a stainless- steel impactor weight of 200 gm was allowed to free fall from a 7.5 cm height onto the specimen. The energy imparted to the cartilage surface was determined based on preliminary assessments to create minor chondral cracks on the cartilage surface98. The impactor was lifted from the sample immediately after the impact to prevent creep deformation.

The surface abrasion damage was done to the second sub-group (ABR, n = 12) by a custom-made grinding tool Figure 12, the damage was done using P80 sandpaper (corresponding to a particle size of 200 µm). The surface of each sample was abraded under constant stress (4 kPa) by a rotating (180 rotation) metal plate with sandpaper glued onto it. The abrasion was done along two perpendicular directions.

All the samples were rinsed in a phosphate-buffered saline (PBS) solution for a period of 1hour, immediately after each respective mechanical injury, to allow for cartilage recovery.

4.1.2. Enzymatic Degradation

Specimens of the enzymatic degradation group were further divided into three groups. The first was subjected to Collagenase D (0.1 mg/ml, Sigma-Aldrich Inc., St. Louis, MO, USA) for 90 mins (COL90, n = 12) to degrade the collagen network99 mildly, mimicking early OA. The second group was utilized to simulate severe OA damage, which was achieved by increasing the incubation time in Collagenase D to 24 hours (COL24, n = 12). The third and final group was exposed to trypsin (0.5 mg/ml, T4299, Sigma-Aldrich Inc., St. Louis, MO, USA) for 30 mins (TRYP30, n = 12) to deplete proteoglycans100 (with minor collateral effect on the collagen network). The short incubation time of trypsin was employed to ensure the degradation of the

Figure 11: Custom-made drop- tower used to induce impact injury.

Figure 12: Custom-made grinding tool used for abrasing the sample.

(28)

28

cartilage mimics early OA. Specimens were incubated at 37 C and 5% CO2 in PBS solution containing the respective enzymes and supplemented with antibiotics, including Penicillin (100 U/mL) –Streptomycin (100 µg/mL) – Amphotericin B (0.25 µg/mL), Sigma-Aldrich Inc., St.

Louis, MO, USA) for the different times specified previously.

Just like the mechanical damage group, all the samples were rinsed in a PBS solution. All the injures above were created following the protocol described in earlier studies.41,101

4.2. Spectroscopic Measurements.

Each specimen was measured three times using Thermo Fisher Scientific’s (Madison, WI, USA) DXR2xi Raman confocal microscope. The spectra were collected from the center of each specimen using a 10x objective with a 50 µm confocal pinhole. A 30-mW powered laser of wavelength equal to 785 nm to minimize fluorescence. The device was configured to obtain wavelength shifts in the range of 200 cm-1 to 3400 cm-1. The exposure time setting was set for 0.5 seconds, and 120 accumulations were used41.

4.3. Reference Measurements 4.3.1. Biomechanical

Biomechanical measurements were conducted using a custom made high-precision (resolution:

0.1 µm, 0.005 N) material testing device98,102,103, equipped with a cylindrical indenter (diameter

= 0.7 mm). As was the case with the Raman spectrum collection, the biomechanical measurements were done at the center of the specimens. The specimens were submerged in PBS in the measurement chamber. The bone end of the osteochondral samples was glued to the bottom of the measurement chamber. A goniometer was utilized to adjust the perpendicularity between the cartilage surface and the indenter’s tip. A custom-made software (LabView, National Instruments) was used to control the measurement and for data acquisition.

First, the specimens were preconditioned using a cyclic 2% strain (4 full cycles). A stress- relaxation protocol was then implemented to determine both the equilibrium modulus (Eeq) and the instantaneous modulus (Einst). The stress-relaxation protocol consisted of three phases. The cartilage was compressed 5% of its remaining thickness in each phase, with a ramp rate of 100

%/s. The specimen was left to relax for 900 s between each phase. Finally, the dynamic modulus (Edyn) was measured by a sinusoidal dynamic test which was conducted with frequencies of 0.1, 0.5, 1, and 2 Hz. The sinusoidal test was done for four cycles with a strain

(29)

29

amplitude equal to 2% of the remaining cartilage thickness. An elastic and isotropic model was adopted to calculate all the moduli mentioned above104.

4.3.2. Collagen Orientation

In preparation for making the histological sections, Formalin was used to fix the specimens, which were then decalcified by Ethylenediaminetetraacetic acid (EDTA), and then implanted in paraffin. This was followed by cutting the histological sections with a thickness of 3µm.

Polarized light microscopy was utilized to measure the birefringence of collagen in the samples, which was used to quantify their collagen orientation.

Unstained histological sections were placed on standard microscope slides and imaged with Abrio PLM system (CRi, Inc., Woburn, MA, USA) which is fitted on a conventional light microscope (Nikon Diaphot TMD, Nikon, Inc., Shinagawa, Tokyo, Japan). The PLM is made up of a bandpass filter (green), a circular polarizer, and a computer- controlled analyzer composed of two liquid crystal polarizers, and a CCD camera. The PLM measures the anisotropy of dichoric materials automatically105.

Collagen orientation profile was calculated in MATLAB106 for each sample by averaging vertically a rectangle, as shown in Figure 13, whose top and bottom horizontal sides coincided with the cartilage’s edge (beginning of superficial zone) and the tide mark (end of the deep zone), respectively. The superficial zone average collagen orientation was obtained by averaging the first 10% of the depth profile, the following 15% corresponded to the middle zone, and the rest represented the deep zone.

Figure 13: Calculation of collagen orientation profile in MATLAB®

(30)

30

4.3.3. Proteoglycan Content

Digital densitometry (DD) measurements were utilized to measure proteoglycans distribution across the depth of the cartilage samples. DD gives information from a colored (stained) specimen by correlating the color density to the proteoglycan density.

Some of the previously prepared histological sections were stained with Safranin-O, which binds stoichiometrically with matrix proteoglycans to give them color, Figure 14 shows a sample histological section for each damage group. The histological sections were then imaged digitally with a PathScanEnabler-IV (Meyer Instruments, Inc., USA)41. The images obtained were processed in the same way as previously mentioned collagen orientation images to obtain the proteoglycan’s average density in the superficial, middle, and deep zone of the samples.

4.4. Pre-processing and multivariate analysis 4.4.1. Pre-processing

The acquired spectra were pre-processed using nippy1, an open-source spectral pre-processing toolbox107. H. Butler & co. s’ protocol on pre-processing of Raman spectra108 was consulted.

Thirty different pre-processing configurations were prepared for the multivariate analysis step.

In all configurations, the spectra were trimmed to the fingerprint region 750 – 1800 cm−1as shown in Figure 15, and a Savitzky-Golay filter was applied for smoothing via a third-degree

1 https://github.com/uef-bbc/nippy

Figure 14:Stained Histological Sections. (A)Control Group, (B)Collagenase 24h, (C)Impact, (D)Collagenase 90min, (E)Abrasion, (F)Trypsin 30min

(31)

31 polynomial. All possible combinations of the following variables were tried. Using standard normal variate normalization (SNV) or not, differentiating the spectra once, twice, or not differentiating at all -although baseline correcting in that case - and lastly, using a filtering window of one of the following sizes (7/15/31/61/91). The variations used are summarized in Table 3.

Table 3:Pre-Processing Parameters

4.4.2. Multivariate analysis

Multivariate analysis was conducted using machine learning algorithms: Random Forests, Support Vector Machines, and Partial Least Squares Regression. The analysis was done in python using the sci-kit-learn toolkit89. Each of the algorithms was tried to estimate the targets (instantaneous modulus, equilibrium modulus, dynamic modulus, cartilage thickness, collagen orientation in superficial, middle, and deep zone, and proteoglycans content in the superficial, middle, and deep zones). The samples were divided into 12 groups, according to their patella of origin. Each group contained six samples (control, abrasion, impact, collagenase 90min, collagenase 24h, and trypsin 30min). Double cross-validation was performed. Each iteration two groups were left out for testing and the remaining ten groups were used for training and validation. Five-fold cross-validation was implemented in the training-validation step, with the best model chosen according to the lowest root mean square error of prediction (RMSEP). Six iterations were made, each group was left out as a test once.

Trim 750 - 1800

Polynomial Order 3

SNV Type Yes/No

Derivative Order 0/1/2

Filter Window 7/15/31/61/91

Figure 15: Raman Fingerprint Region

(32)

32

A grid search method was employed to find the optimal hyperparameters for each algorithm, this was repeated for each spectral preprocessing configuration. Table 4 shows the hyperparameters varied for the RF algorithm along with their allowed values. The hyperparameters are bootstrap, maximum tree depth, the maximum number of features, minimum samples per leaf node, minimum samples per split. A constant number of hundred estimators was used. The hyperparameters varied for the SVM algorithm are presented in Error! Reference source not found. along with their searched space. Namely, they are, the t ype of kernel used, the penalty factor (C), the degree of the polynomial -if the polynomial kernel was used, and gamma -in the case of RBF or sigmoid kernels. Lastly, for the PLSR algorithm, the number of principal components to be used was varied from one to 15.

Table 4:RF-Hyperparameters search space

Table 5:SVM-Hyperparameters search space

Kernel Linear / Poly / Sigmoid / RBF

C 0.001 / 0.1/ 1 /10/100/1000

Degree 1/2/3

Gamma Scale / Auto

Bootstrap True/False

Maximum Depth 80/90/100/110

Maximum Features 2/3

Minimum Samples Leaf 3/4/5

Minimum Samples Split 8/10/12

Number of Estimators 100

(33)

33

5. Results

Raman Spectra

The mean Raman spectra (background and baseline corrected) of each damage group compared to the control spectrum are shown in Figure 17-Figure 20. Raman peak assignments are presented in Table 6. COL24 group showed the most extreme variation relative to the control group as it represents severe OA. All peaks showed a decrease relative to the control group, except the peaks at 1610 cm-1 and 1510 cm-1 increased, the former corresponds to phenylalanine and tyrosine aromatic ring breathing and the latter is likely related to the collagenase itself. The SO3 symmetric stretch band at 1060 cm-1 increased significantly across all damage groups, except for the IMP group, for which it increased only slightly. The ABR group spectrum showed a slight increase of the bands at 1336 and 1381cm-1, which are attributed to CH2 wagging and glycosaminoglycans, respectively. TRYP30 spectrum showed a slight decrease in the glycosaminoglycan band at 1381 cm-1 accompanied by a decrease also in proline, hydroxyproline, and tyrosine bands. Notable changes in the COL90 spectrum include a subtle decrease in amide III doublet, hydroxyproline, tyrosine, proline, and tryptophan bands.

Figure 21 shows the mean of enzymatic damage groups vs the mechanical ones relative to the control group mean. The COL24 group was left out in the calculation of the enzymatic mean spectrum, as it represents severe OA, and it would greatly bias the spectrum. Notable peak differences occur at The SO3 symmetric stretch band at 1060 cm-1 increased significantly for both mechanical and enzymatic damage. On the other hand, a slight decrease of the proline and hydroxyproline bands at 938 and 855 cm-1 respectively was present in the enzymatic mean only.

(34)

34

Figure 16:Impact mean spectrum vs Control mean

Figure 17:Abrasion mean spectrum vs Control mean

(35)

35

Figure 18:Collagenase 90-minute mean spectrum vs Control mean

Figure 19:Trypsin 30-minute mean spectrum vs Control mean

(36)

36

Figure 20:Collagenase 24-hours mean spectrum vs Control mean

Figure 21:Mean of Mechanical damage vs Mean of Enzymatic Damage vs Control group mean

(37)

37

Table 6:Raman peak assignments

Raman Shift (Cm-1)

Peak

Letter Assignment Component Cartilage ECM

Component Reference

1660.7 A Amide I Protein Collagen 39,42,109

1610.6 B v C=C aromatic ring Phenylalanine, Tyrosine

42,109

1560.4 C v C=C aromatic ring Tryptophan, Tyrosine 42,109

1510.3 D unassigned Collagenase D

1450.5 E

d CH2/CH3 deformation Protein, Lipid Collagen 39,42,109

1425.4 F

COO Glycosaminoglycans Proteoglycans 42,109

1381.1 G unassigned Glycosaminoglycans 42,109

1336.7 H

Delta CH2 waging Glycosaminoglycans 42,109

1317.4 I

CH2 twist

42,109

1265.4 J Amide III Protein Collagen 39,42,109,110

1242.2 K Amide III Protein Collagen 39,42,109,110

1201.7 L Hydroxyproline,

Tyrosine

42,109

1161.2 M Delta C–H (Tyr) / NH3+

(Collagen)

Tyrosine Collagen 42,109,111

1122.7 N v C-N, v C-C Protein 42,109

1095.7 O

PO2 str. DNA/RNA 42,109,112

1061.0 P

SO3 symmetric stretch Chondroitin sulfate Proteoglycans 39,42,109,110

1032.0 Q Phenylalanine Ring

deformation Phenylalanine 42,109

1001.2 R Phenylalanine (C-C) ring

breathing Phenylalanine 39,42,110

962.6 S

v1 PO43- Apatite Mineral 39,110

937.5 T v C–C Protein backbone

Proline Collagen 42,109

920.2 U v C–C Proline,

Hydroxyproline Collagen 39,42,109

873.9 V v C–C Proline,

Hydroxyproline Collagen 39,42,109,110

(38)

38

854.6 W v C–C Proline Collagen 39,42,109,110

812.2 X v C–C Protein backbone Collagen (& other proteins)

42,109

762.1 Y Tryptophan ring

deformation Protein 109

Reference variables

Table 7 shows the average values for the mechanical properties, collagen orientation, and PG content of each damage group. A significant drop in biomechanical properties is observed in COL24 and IMP groups. In contrast, the ABR group’s biomechanical properties did not change much relative to the control group. COL90 and TRYP30 groups showed a moderate drop in their biomechanical properties. Regarding proteoglycans content, a relatively significant higher PG content was observed in the three zone ratios in the COL90 and ABR groups. Lastly, the collagen orientation in the superficial zone significantly increased for the COL24 group, along with an increase in the average orientation in the middle zone for COL90 and ABR groups accompanied by an increase in the deep zone also for the latter. Box-plot distributions of target variables are presented in Figure 22.

(39)

39

Table 7: Average reference values of each damage group

Group Mechanical Moduli

Equilibrium (MPa)

Dynamic (MPa)

Instantaneous (MPa)

Thickness (µm)

Control 1.06 6.78 6.75 1917.82

Collagenase 24h 0.30 3.31 2.72 1736.27

Impact 0.36 3.04 2.48 2021.08

Collagenase 90min 0.73 4.23 3.40 1863.33

Abrasion 0.96 6.08 4.89 1887.08

Trypsin 30min 0.65 4.82 3.80 1784.17

Group Proteoglycan Content (A.U)

Superficial Middle Deep

Control 0.25 0.61 0.87

Collagenase 24h 0.19 0.51 0.95

Impact 0.37 0.77 1.10

Collagenase 90min 0.50 0.99 1.18

Abrasion 0.48 1.00 1.16

Trypsin 30min 0.35 0.87 1.12

Group Collagen Orientation (Degrees)

Superficial Middle Deep

Control 20.74 59.21 75.63

Collagenase 24h 51.45 74.73 75.67

Impact 20.78 68.31 79.45

Collagenase 90min 32.25 77.25 79.40

Abrasion 31.74 78.38 81.87

Trypsin 30min 30.80 72.67 78.90

(40)

40

Figure 22:Target variables box-plot distributions. A) Dynamic and Instantaneous Moduli, B) Equilibrium Modulus, and C) Cartilage thickness, D) Proteoglycan

Content, and E) Collagen Orientation.

(41)

41

Optimal Machine Learning Models

As previously mentioned, the model with the lowest RMSEP was chosen as the optimal model to predict each target. Table 8 shows the optimal machine learning algorithm and the corresponding hyperparameters that optimize the relationship between the Raman spectra and the different reference properties. It can be observed that the optimal PLSR models utilized only a few principal components (3-5), the optimal SVM models used linear and second-degree polynomial kernels, although higher degree polynomials and more complex kernel functions were present in the search. It can also be observed that none of the optimal models utilized second-degree differentiation preprocessing, and only two out of ten used first degree, while the rest used the undifferentiated spectra. Also, excessive smoothing was not ideal for the optimal models.

Table 8: Optimal models hyperparameters and pre-processing

Target Model Derivative Order Filter

Window SNV

Biomechanical Properties

Thickness PLSR (No. of Components = 5) 1 15 yes

Instantaneous Modulus SVM (C=100, degree=2, kernel='poly') 1 31 yes Equilibrium Modulus SVM (C=10, degree=2, kernel='poly') 0 31 yes

Dynamic Modulus

RF (Bootstrap=False, Max_Depth=90, Max_Features=3, Min_Samples_leaf=2, Min_Samples_split=10)

0 15 no

Collagen Orientation

Superficial SVM (C=10, degree=2, kernel='linear') 0 31 yes

Middle SVM (C=1000, degree=2, kernel='poly') 0 7 yes

Deep SVM (C=10, degree=2, kernel='poly') 0 7 no

Proteoglycan Content

Superficial PLSR (No. of Components = 3) 0 31 yes

Middle PLSR (No. of Components = 4) 0 15 yes

Deep PLSR (No. of Components = 4) 0 91 no

PLSR was the optimal algorithm for predicting the proteoglycan content across all zones, on the other hand, SVMs had similar success with collagen orientation. Figure 23 and Figure 24 show scatter plots of measured vs predicted targets, Table 9 presents the RMSE of cross- validation, calibration, and prediction percentages calculated by division over the target range.

Spearman correlation coefficient was also calculated for the training and test points along with their R-squared correlation.

Viittaukset

LIITTYVÄT TIEDOSTOT

Keywords: data mining, machine learning, intrusion detection, anomaly detection, cluster- ing, support vector machine, neural

The combination of hybrid regression modelling and a spectral classifier en- abled the NIRS-based arthroscopic evaluation of the biomechanical properties of articular cartilage in

To conclude, the composition and structure of equine articular cartilage undergoes changes with depth that alter functional properties during maturation, with the typical properties

Functional Effects of an Interpenetrating Polymer Network on Articular Cartilage Mechanical Properties..

In this study, we examined four different ML algorithms, namely, support vector machine (SVM), naive Bayes (NB), boosted decision tree (BDT), and decision forest (DF)

Using regression analysis and machine learning method extreme gradient boosting (XGBoost) [2], multivariable risk prediction models were developed in a separate training

Then the data was used to generate regression models with four different machine learning methods: support vector regression, boosting, random forests and artificial neural

The best model was obtained by the Support Vector Machine classifier using Random Forest to select the most important features: the general accuracy achieved was 89.8% and the