• Ei tuloksia

Accounting for spatial dependency in multivariate spectroscopic data

N/A
N/A
Info
Lataa
Protected

Academic year: 2022

Jaa "Accounting for spatial dependency in multivariate spectroscopic data"

Copied!
29
0
0

Kokoteksti

(1)

DSpace https://erepo.uef.fi

Rinnakkaistallenteet Luonnontieteiden ja metsätieteiden tiedekunta

2018

Accounting for spatial dependency in multivariate spectroscopic data

Prakash, M

Elsevier BV

Tieteelliset aikakauslehtiartikkelit

© Elsevier B.V.

CC BY-NC-ND https://creativecommons.org/licenses/by-nc-nd/4.0/

http://dx.doi.org/10.1016/j.chemolab.2018.09.010

https://erepo.uef.fi/handle/123456789/7127

Downloaded from University of Eastern Finland's eRepository

(2)

Accounting for spatial dependency in multivariate spectroscopic data M. Prakash, J.K. Sarin, L. Rieppo, I.O. Afara, J. Töyräs

PII: S0169-7439(18)30294-6

DOI: 10.1016/j.chemolab.2018.09.010 Reference: CHEMOM 3686

To appear in: Chemometrics and Intelligent Laboratory Systems Received Date: 14 May 2018

Revised Date: 19 September 2018 Accepted Date: 26 September 2018

Please cite this article as: M. Prakash, J.K. Sarin, L. Rieppo, I.O. Afara, J. Töyräs, Accounting for spatial dependency in multivariate spectroscopic data, Chemometrics and Intelligent Laboratory Systems (2018), doi: https://doi.org/10.1016/j.chemolab.2018.09.010.

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

(3)

M AN US CR IP T

AC CE PT ED

1 Accounting for spatial dependency in multivariate spectroscopic data

1 2

Prakash M1, Sarin J. K1,2, Rieppo L1,3, Afara I. O1, Töyräs J1,2 3

mithilesh.prakash@uef.fi 4

jaakko.sarin@uef.fi 5

lassi.rieppo@oulu.fi 6

isaac.afara@uef.fi 7

juha.toyras@uef.fi 8

9

1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland 10

2 Diagnostic Imaging Center, Kuopio University Hospital, Kuopio, Finland 11

3 Research Unit of Medical Imaging, Physics and Technology, Faculty of Medicine, 12

University of Oulu, Oulu, Finland 13

14 15 16

Running title: Spatial dependency in multivariate NIRS 17

18 19 20 21

Corresponding author:

22

Mithilesh Prakash, M.Sc. (Tech.) 23

Department of Applied Physics 24

University of Eastern Finland 25

Kuopio, Finland 26

Tel: +358 449204680 27

Fax: +358 17162131 28

Email: mithilesh.prakash@uef.fi 29

(4)

M AN US CR IP T

AC CE PT ED

2 Abstract

30

We examine a hybrid multivariate regression technique to account for the spatial 31

dependency in spectroscopic data due to adjacent measurement locations in the 32

same joint by combining dimension reduction methods and linear mixed effects 33

(LME) modeling. Spatial correlation is a common limitation (assumption of 34

independence) faced in diagnostic applications involving adjacent measurement 35

locations, such as mapping of tissue properties, and can impede tissue evaluations.

36

Near-infrared spectra were collected from equine joints (n=5) and corresponding 37

biomechanical (n=202), compositional (n=530), and structural (n=530) properties of 38

cartilage tissue were measured. Subsequently, hybrid regression models for 39

estimating tissue properties from the spectral data were developed in combination 40

with principal component analysis (PCA-LME) scores and least absolute shrinkage 41

and selection operator (LASSO-LME). Performance comparison of PCA-LME and 42

principal component regression, and LASSO-LME and LASSO regression was 43

conducted to evaluate the effects of spatial dependency. A systematic improvement 44

in calibration models’ correlation coefficients (ρPCA-LME: 4 to 52% and ρLASSO-LME: 1 to 45

10%) and a decrease in cross validation errors (ErrorPCA-LME: 3 to 10% and 46

ErrorLASSO-LME: 1 to 4%) were observed when accounting for spatial dependency. Our 47

results indicate that accounting for spatial dependency using a LME-based approach 48

leads to more accurate prediction models.

49 50

Keywords: Linear mixed effects; articular cartilage; near infrared (NIR) 51

spectroscopy; spectroscopic mapping; principal components; LASSO.

52 53

(5)

M AN US CR IP T

AC CE PT ED

3 1. Introduction

54

Articular cartilage, a connective tissue covering the ends of bones in a joint, is 55

susceptible to post-traumatic osteoarthritis (PTOA) due to focal injuries caused by 56

sudden excessive impact loading. The injury, although initially localized, often 57

spreads over time, resulting in altered functional performance of the whole joint.

58

Arthroscopic evaluation of tissue properties around the injury site and assessing the 59

spread of the injury could enable optimal surgical intervention, thereby minimizing 60

the risk of PTOA. Currently, in clinical arthroscopies[1], cartilage is assessed visually 61

through an endoscope and by palpating the tissue surface with a metal hook[2]. This 62

method is qualitative, unreliable, and poorly reproducible[3,4], thus necessitating 63

development of novel, quantitative, robust, and reliable methods.

64 65

Non-destructive diagnostic tools, such as near-infrared (NIR) spectroscopy, have 66

shown potential for arthroscopic characterization of articular cartilage integrity[5].

67

NIR spectroscopy is a vibrational spectroscopic technique that has been utilized for 68

spatial assessment of cartilage biomechanical, compositional, and structural 69

properties[6–8]. In these studies, multivariate regression was utilized to relate 70

cartilage NIR spectra with its tissue properties. However, conventional multivariate 71

regression methods, such as partial least squares (PLS), are based on the 72

underlying assumption of independent observations[9], whereas biomedical 73

characterization of tissue integrity, for example in arthroscopy, often involves multiple 74

measurement locations within close proximity in the same joint. This grouping effect 75

of samples introduces spatial dependency and is likely to cause unreliable 76

correlations if unaccounted for in regression modeling[10,11].

77

(6)

M AN US CR IP T

AC CE PT ED

4 78

Linear mixed effects (LME) regression and its input parameters, namely fixed effects 79

and random effects, can be designed for specific datasets to account for grouping 80

effects. Since only a limited number of regressors (input variables) can be utilized in 81

model creation using LME, adaptation for a large set of variables, such as NIR 82

spectra, requires dimension reduction and/or variable selection methods. Hence, the 83

input variables need to be methodically selected by retaining only the most important 84

ones.

85 86

Application of dimension reduction methods, such as principal component analysis 87

(PCA)[12] via PCA score, and variable selection and regularization methods like 88

LASSO (least absolute shrinkage and selection operator) or L1-penalization[13], are 89

effective approaches for reducing the high dimensionality of the data, such as NIR 90

spectra. PCA finds a set of projections that maximizes the variance in the original 91

dataset; hence, the data structure in the sample space is captured even in the low 92

dimensional subspaces. LASSO[14] is a regularization method ideal for creating 93

sparse models with high statistical accuracy in predictions.

94 95

In this study, we propose a hybrid technique, which combines dimension reduction 96

methods and LME regression, to account for spatial dependency in analysis of 97

multivariate dataset. This is based on the hypothesis that the hybrid regression 98

technique can effectively model the relationship between cartilage NIR spectra and 99

its properties while accounting for dependencies within the data.

100 101

(7)

M AN US CR IP T

AC CE PT ED

5 2. Materials and Methods

102

To account for spatial dependency in the dataset, the contributing levels of 103

dependency must first be identified. The levels of dependency are defined by the 104

experimental design and the scope of the application. In our application on NIR- 105

based characterization of cartilage, joint level (measurement locations grouped in 106

one complete joint) and bone level (measurement locations grouped on one bone of 107

a particular joint) were identified (Figure 1) as the two significant levels of 108

dependency[15]. (Other application specific dependency levels can be 109

accommodated in the design matrix) 110

111

[Proposed position for Fig. 1]

112 113

Subsequently, models were developed for relating the predictors ( ) to the response 114

variables ( ) while accounting for the identified dependency levels (grouping effects).

115

The adaptation of LME can be written in the equation form:

116

= + + + ɛ, (1)

117

where is an N(number of observations)-by-1 response vector of reference values 118

for the ith tissue property, is an N-by-P (dimension reduced NIR spectra) matrix of 119

fixed effect regressors, is a P-by-1 vector of fixed effects coefficients, is an N-by- 120

Q (grouping count) random effects design matrix, is an N-by-1 vector of additional 121

random effects vector, and are the mixed effects coefficients of sizes Q-by-1 122

and 1-by-1 respectively, and ɛ is an N-by-1 vector representing the observation error.

123

Restricted maximum likelihood method was employed for estimating LME[16].

124

(8)

M AN US CR IP T

AC CE PT ED

6 125

2.1 Equine cartilage dataset 126

In this study, we utilized NIR spectral data from equine cartilage measured in earlier 127

studies[17,18]. In summary, metacarpophalangeal joints (n=5) were acquired from a 128

slaughterhouse, and specific areas of interest (AI, n=44) with cartilage lesions of 129

varying severity were selected by a veterinary surgeon. Subsequently, a 15 x 15 mm 130

grid consisting of 25 measurement locations was marked on each AI with a felt-tip 131

pen (Figure 1). The measurement locations were equally spaced (interdistance = 2.5 132

mm), and locations with highly eroded cartilage were excluded, yielding a total of 869 133

measurement points. NIR spectral measurements and thickness values were 134

acquired on each of the 869 measurements; however, biomechanical measurements 135

were performed only on 202 locations and compositional analysis on 530 locations 136

due to limitations set by sample preservation and geometry, respectively. NIR 137

spectra was matched with corresponding tissue property based on location during 138

regression analysis.

139 140

2.2 NIR spectral measurements 141

The NIR spectroscopy instrumentation consisted of a halogen light source 142

(wavelength range: 360–2500 nm, power 5 W, optical power: 239 µW in a dfiber = 600 143

µm, Avantes BW, Apeldoorn, Netherlands), and a spectrometer (wavelength range:

144

200–1160 nm, Avantes BW, Apeldoorn, Netherlands). A customized fiber optic probe 145

(d=5 mm) consisting of seven fibers (dfiber=600 µm) within the central window (d=2 146

mm), the six outer fibers for transmitting, and the central one for collecting the 147

reflectance spectrum, was utilized. Prior to sample measurements, dark and 148

(9)

M AN US CR IP T

AC CE PT ED

7 reference spectra were acquired. Dark spectrum was acquired with the spectrometer 149

light source switched off in order to collect background noise. With the light source 150

switched on, reference spectrum was acquired from a reflectance standard 151

(Spectralon, SRS-99, Labsphere Inc., North Sutton, USA). The absorbance values of 152

each sample spectra were scaled as per Beer-Lambert’s law using the dark and 153

reference spectra. In addition, signal acquisition time was optimized to maximize the 154

signal to noise ratio. The average of three spectral measurements that each 155

consisted of eight co-added spectral scans (teight scans=720 ms) was calculated. To 156

preprocess the spectra (700-1050nm), Savitzky-Golay estimates of the second 157

derivative using 41 points (or 25 nm) and a third-order polynomial for the smoothing 158

were computed. This preprocessing not only removes baseline offset and dominant 159

linear terms but also enhances subtle absorption peaks.

160 161

2.3 Cartilage thickness and biomechanical measurements 162

Cartilage thickness at all NIRS measurement locations was determined using optical 163

coherence tomography (OCT) via the Ilumien PCI Optimization System, (St. Jude 164

Medical, St. Paul, MN, USA) at an operating wavelength of 1305±55 nm, axial 165

resolution <20 µm, and lateral resolution 25–60 µm. The samples were fully 166

immersed in phosphate-buffered saline (PBS) during the measurements.

167 168

Biomechanical indentation measurements were performed at 202 locations using a 169

customized material testing device consisting of a load cell (Sensotec, Columbus, 170

OH, USA) with force resolution of 5 mN, an actuator (PM1A1798-1 A, Newport, 171

Irvine, CA, USA) with displacement resolution of 0.1 µm (PM500-1 A, Newport, 172

(10)

M AN US CR IP T

AC CE PT ED

8 Irvine, CA, USA), and a plane-ended cylindrical indenter (d=0.53 mm). Equilibrium 173

modulus (Eeq) and dynamic modulus (Edyn) were calculated using an indentation 174

protocol detailed in Korhonen et al[19] and Sarin et al[17].

175 176

2.4 Reference measurements of cartilage composition and structure 177

The osteochondral samples were first decalcified in a solution containing formalin 178

and ethylenediaminetetraacetic acid (EDTA), then fixed in paraffin blocks from which 179

thin sections (n=4, thickness=5 µm) were cut using a microtome along the 180

measurement line. The sections were then subjected to histological imaging, i.e., 181

Fourier transform infrared (FTIR) microspectroscopy (n=1) and polarized light 182

microscopy (PLM, n=3).

183 184

2.5 Collagen and proteoglycan distribution 185

Spatial collagen and proteoglycan (PG) distributions were measured by FTIR 186

microspectroscopy using a Thermo iN10 MX FT-IR microscope (Thermo Nicolet 187

Corporation, Madison, WI, USA). The microscope was operated in transmission 188

mode at a spectral resolution of 4 cm-1 and a pixel size of 25 × 25 µm2. 500-µm-wide 189

regions including full cartilage thickness were mapped from each sample in the mid 190

infrared region. The average of four scans per pixel were obtained. The collagen 191

content for 530 sample locations was estimated from the amide I peak (1584-1720 192

cm-1), and PG contents for these samples was obtained from the carbohydrate 193

region (984-1140 cm-1).

194 195

(11)

M AN US CR IP T

AC CE PT ED

9 2.6 Collagen orientation

196

Collagen orientation was measured using an Abrio PLM system (CRi, Inc., Woburn, 197

MA, USA) on top of a conventional light microscope (Nikon Diaphot TMD, Nikon, 198

Inc., Shinagawa, Tokyo, Japan). This system consists of a green bandpass filter, a 199

circular polarizer, and a computer-controlled analyzer comprising of two liquid crystal 200

polarizers and a charged couple device (CCD) camera. The specimens (n=530) 201

were imaged at identical orientation using a 4.0x objective, which resulted in a pixel 202

size of 2.53 × 2.53 µm2. The collagen fiber orientation in the resulting images show 0 203

degrees for collagen aligned parallel to cartilage surface and 90 degrees for collagen 204

aligned perpendicular to cartilage surface.

205 206

Table 1 summarizes the dataset used in this study.

207 208

[Proposed position for Table. 1]

209 210

2.7 Hybrid regression analysis approach 211

2.7.1 PCA-LME regression technique 212

With the hybrid PCA-LME regression technique, the equation is:

213

Tissue property ~ PCA scores + (1 | Joints 1-5) + (1| Bones Upper-Lower). (2) 214

Hence, PCA scores are used as fixed effects, and measurement grouping 215

information from joint level (Z matrix) and bone level (M matrix) are used as mixed 216

(12)

M AN US CR IP T

AC CE PT ED

10 effects (1 | dependency levels) to predict the tissue property (Figure 1 and Equation 217

2).

218 219

The data was split into calibration and test datasets (Figure 2) at the AI level. Hence, 220

the calibration and test datasets had no spectra from same AIs. With a 10:1 data 221

split, 40 AIs were utilized in calibration of the PCA-LME model and 4 AIs to test the 222

model. This split was repeated 11 times with each AI included in the test set only 223

once.

224 225

During modelling, calibration dataset was subjected to a 10-fold cross-validation. In 226

cross-validation, the number of PCA scores to build the PCA-LME model was varied 227

from 1 to 15 and the model with the smallest RMSECV was retained. The retained 228

model was used to predict the tissue property from the test dataset and the root 229

mean square error of prediction (RMSEP) was calculated to assess model 230

performance.

231 232

The PCA scores of the test dataset were calculated by adjusting the test spectra 233

according to the mean of the calibration spectra (µcalibration). This was performed by 234

first calculating µcalibration and principal components coefficients (Coeff) of the 235

calibration spectra. Subsequently, µcalibration was subtracted from the test spectra, and 236

then multiplied by Coeff (Figure 2).

237 238

[Proposed position for Fig. 2]

239

(13)

M AN US CR IP T

AC CE PT ED

11 240

2.7.2 LASSO-LME regression technique 241

Similar to the PCA-LME regression approach, dimension reduction was performed 242

with LASSO. LASSO adds a penalty equivalent to the sum of absolute values of the 243

magnitude of the coefficients to each wavelength and a nonnegative regularization 244

parameter λ. Next, utilizing 10-fold cross-validation, the sum of squared errors (SSE) 245

of the LASSO fit is calculated and plotted for varying λ values. Wavelengths 246

corresponding to the minimum SSE were retained. The dimension-reduced NIR 247

spectra, based on the selected wavelengths, were used as input to LME regression 248

(according to equation 1).

249 250

The performance of models was evaluated as an average of 11 iterations.

251

Spearman’s rank correlation (ρ), a distribution free and non-parametric statistic, was 252

employed to assess the regression models due to assumptions on normality of the 253

dataset. The commonly used coefficient of determination (R2) was not used as its 254

definition for mixed model is unclear[20]. All spectral and regression analysis were 255

done using custom-made programs designed on MATLAB R2017b (Mathworks Inc, 256

Natick, MA).

257 258

3. Results 259

The results (average of 11 iterations) of the hybrid regression models in comparison 260

to corresponding standard regression models (Table 2) showed consistently higher 261

correlation coefficient and lower RMSECV. Hence, the inclusion of spatial 262

dependency information in the models resulted in better performance. The changes 263

(14)

M AN US CR IP T

AC CE PT ED

12 in ρ relative to RMSEP in test sets (Table 2 and Figure 4) were inconsistent for PG 264

content, Eeq and collagen content.

265 266

[Proposed position for Table. 2]

267 268

[Proposed position for Fig. 3]

269 270

[Proposed position for Fig. 4]

271 272

The optimal number of principal components were 7-9. The performance of LASSO- 273

based variable selection varied depending on the predicted parameter. Prediction of 274

cartilage thickness required the highest number of variables (Figure 3), while Eeq

275

required the least.

276 277

4. Discussion 278

In this study, we propose a hybrid regression technique to account for the effect of 279

spatial dependency commonly encountered in spectroscopic characterization of 280

biological tissues. We first identified the dependency levels based on known 281

groupings of the measurement locations. We then introduced hybrid techniques that 282

combine variable reduction methods (based on PCA and LASSO) and LME 283

regression, allowing incorporation of the identified dependency levels into the 284

predictive models. The performance of PCA-LME and LASSO-LME were then 285

(15)

M AN US CR IP T

AC CE PT ED

13 compared with that of PCR and LASSO regression, respectively. The results 286

highlighted the importance and benefits of accounting for spatial dependency.

287 288

Assumptions of sample independence can lead to unreliable correlations as 289

elucidated by Ranstam et al[21]. Conforti et al suggested a viable approach for 290

considering spatial variations in soil organic matter content[22], where dimension 291

reduction was performed by combining PLS scores with LME modeling, thereby 292

accounting for the dependency structure in the measured data. However, this 293

approach is unsuitable when the reference parameters or independent parameters 294

are blinded or unknown, such as in independent testing or real-time applications. As 295

predictors and responses are required to obtain the scores, PLS cannot be 296

performed only on the spectral data. However, this practical limitation during 297

independent testing could be easily circumvented by employing PCA reduction of the 298

spectra. We also examined the potential of regularization by LASSO as it effectively 299

shrinks the input spectra[23]. Since a significant number of spectral variables are 300

penalized with zero (or absolute) coefficients, the resulting models are sparse and 301

hence provide feature selection. Although the resulting models based on LASSO- 302

LME are efficient, the penalization process in itself is computationally time 303

consuming in comparison to other regression methods[24], such as the adopted 304

PCA-LME approach.

305 306

In our application, the main dependency levels for cartilage dataset were joint level 307

and articulating bones (n=2 per joint). The levels, however, could be tailored to suit 308

the experimental design in specific biomedical applications. The results of the 309

(16)

M AN US CR IP T

AC CE PT ED

14 comparison (Table 2) show that standard versions of the regression models (PCR 310

and LASSO) have slightly lower correlations in comparison to the LME-based 311

regression models. This can be attributed to spatial dependency within the dataset, 312

consistent with Singer et al[9] and Ranstam et al[10,21]. Furthermore, the correlation 313

between NIR spectra and biomechanical properties were better than with cartilage 314

composition or structure. The thickness of cartilage and its NIR spectrum are highly 315

correlated, as the cartilage thickness is representative of the path length, which 316

affects the absorption. The biomechanical parameters of cartilage are highly 317

influenced by the superficial cartilage[25]. On the other hand, the matrix composition 318

and structure are an average of superficial, middle and deeper layers of cartilage, 319

thus providing information on the entire tissue cross-section.

320 321

The present results indicate that the hybrid regression technique can effectively 322

account for dependency in NIRS data. Identifying and including relevant dependent 323

levels in the experimental design could be a limiting factor in this study; hence, 324

careful selection of the levels is important. In some iterations, the test set includes 325

values outside the calibration model range, thus making the model perform poorly.

326

This is especially observed in predictions of small datasets, such as equilibrium 327

modulus (n = 202). In addition, the relatively lower number of observations in the test 328

set has a drastic effect on RMSEP value but not so much on correlation (Figure 4B).

329

The performance of PCA-LME can potentially be further improved with variable 330

selection methods, such as genetic algorithm[26]. Other alternatives to account for 331

dependency in data structures include (but are not limited to) multilevel modeling[27]

332

for hierarchical or clustered dataset and kringing[28], commonly utilized in genetic 333

studies and geostatistics, respectively.

334

(17)

M AN US CR IP T

AC CE PT ED

15 335

The present results support our hypothesis and thus advocate the application of 336

LME-based regression technique for NIR spectroscopic characterization of cartilage.

337

Importantly, this method can be easily extended to other spectroscopic applications.

338 339

5. Acknowledgments 340

Tuomas Selander is acknowledged for statistical consultation. This study was funded 341

by the Academy of Finland (project 267551, University of Eastern Finland), Research 342

Committee of the Kuopio University Hospital Catchment Area for the State Research 343

Funding (project PY210 (5041750, 5041744 and 5041772), Kuopio, Finland), 344

Instrumentarium Science Foundation (170033) and Tekniikan edistämissäätiö 345

(8193). Dr. Afara acknowledges grant funding from the Finnish Cultural Foundation 346

(00160079).

347 348

6. Author contributions 349

Prakash M.: Algorithm design and analysis.

350

Sarin, J.K.: Acquisition of data.

351

Rieppo, L: Supervision of statistical analyses.

352

Afara I.O.: Supervision of statistical analyses.

353

Töyräs J.: Study conception and design.

354

All authors contributed in the preparation and approval of the final submitted 355

manuscript.

356

(18)

M AN US CR IP T

AC CE PT ED

16 357

7. Conflict of Interest 358

The authors have no conflicts of interest in the execution of this study and 359

preparation of the manuscript.

360 361

8. References 362

[1] J.H. Ahn, J.H. Wang, J.C. Yoo, Arthroscopic all-inside suture repair of medial 363

meniscus lesion in anterior cruciate ligament—deficient knees: Results of 364

second-look arthroscopies in 39 cases, Arthrosc. J. Arthrosc. Relat. Surg. 20 365

(2004) 936–945. doi:10.1016/j.arthro.2004.06.038.

366

[2] C.S.M. Brittberg, Mats MD, PHD; Winalski, Evaluation of Cartilage Injuries and 367

Repair, J. Bone Jt. Surg. 85 (2003) 58–69. doi: 10.2106/00004623- 368

200300002-00008 369

[3] B.H. Brismar, T. Wredmark, T. Movin, J. Leanderson, O. Svensson, Observer 370

reliability in the arthoscopic classification of osteoarthritis of the knee, J. Bone 371

Jt. Surg. 84 (2002) 42–47. doi:10.1302/0301-620X.84B1.11660.

372

[4] G. Spahn, H.M. Klinger, G.O. Hofmann, How valid is the arthroscopic 373

diagnosis of cartilage lesions? Results of an opinion survey among highly 374

experienced arthroscopic surgeons, Arch. Orthop. Trauma Surg. 129 (2009) 375

1117–1121. doi:10.1007/s00402-009-0868-y.

376

[5] G. Spahn, H.M. Klinger, M. Baums, M. Hoffmann, H. Plettenberg, A. Kroker, 377

G.O. Hofmann, Near-infrared spectroscopy for arthroscopic evaluation of 378

cartilage lesions: results of a blinded, prospective, interobserver study., Am. J.

379

(19)

M AN US CR IP T

AC CE PT ED

17 Sports Med. 38 (2010) 2516–21. doi:10.1177/0363546510376744.

380

[6] M. V. Padalkar, N. Pleshko, Wavelength-dependent penetration depth of near 381

infrared radiation into cartilage, Analyst. 140 (2015) 2093–2100.

382

doi:10.1039/C4AN01987C.

383

[7] G. Spahn, H. Plettenberg, H. Nagel, E. Kahl, H.M. Klinger, T. Mückley, M.

384

Günther, G.O. Hofmann, J.A. Mollenhauer, Evaluation of cartilage defects with 385

near-infrared spectroscopy (NIR): An ex vivo study, Med. Eng. Phys. 30 (2008) 386

285–292. doi:10.1016/j.medengphy.2007.04.009.

387

[8] I.O. Afara, H. Moody, S. Singh, I. Prasadam, A. Oloyede, Spatial mapping of 388

proteoglycan content in articular cartilage using near-infrared (NIR) 389

spectroscopy, Biomed. Opt. Express. 6 (2015) 144.

390

doi:10.1364/BOE.6.000144.

391

[9] M. Singer, T. Krivobokova, A. Munk, B. de Groot, Partial least squares for 392

dependent data, Biometrika. 103 (2016) 351–362. doi:10.1093/biomet/asw010.

393

[10] J. Ranstam, Repeated measurements, bilateral observations and 394

pseudoreplicates, why does it matter?, Osteoarthr. Cartil. 20 (2012) 473–475.

395

doi:10.1016/J.JOCA.2012.02.011.

396

[11] J.A. Cook, J. Ranstam, Statistical models and confounding adjustment, Br. J.

397

Surg. 104 (2017) 786–787. doi:10.1002/bjs.10245.

398

[12] H. Abdi, L.J. Williams, Principal component analysis, Wiley Interdiscip. Rev.

399

Comput. Stat. 2 (2010) 433–459. doi:10.1002/wics.101.

400

[13] J. Schelldorfer, P. Bühlmann, S. Van De Geer, Estimation for High- 401

Dimensional Linear Mixed-Effects Models Using ℓ1-Penalization, Scand. J.

402

(20)

M AN US CR IP T

AC CE PT ED

18 Stat. 38 (2011) 197–214. doi:10.1111/j.1467-9469.2011.00740.x.

403

[14] R. Tibshirani, Regression shrinkage and selection via the lasso: a 404

retrospective, J. R. Stat. Soc. Ser. B (Statistical Methodol. 73 (2011) 273–282.

405

doi:10.1111/j.1467-9868.2011.00771.x.

406

[15] M. Prakash, A. Joukainen, J.K. Sarin, L. Rieppo, I.O. Afara, J. Töyräs, Near- 407

infrared Spectroscopy Based Arthroscopic Evaluation of Human Knee Joint 408

Cartilage, Through Automated Selection of an Anatomically Specific 409

Regression Model, in: Biophotonics Congr. Biomed. Opt. Congr. 2018, OSA, 410

Washington, D.C., 2018: p. OF4D.3. doi:10.1364/OTS.2018.OF4D.3.

411

[16] M.G. Kenward, J.H. Roger, Small Sample Inference for Fixed Effects from 412

Restricted Maximum Likelihood, Biometrics. 53 (1997) 983.

413

doi:10.2307/2533558.

414

[17] J.K. Sarin, M. Amissah, H. Brommer, D. Argüelles, J. Töyräs, I.O. Afara, Near 415

Infrared Spectroscopic Mapping of Functional Properties of Equine Articular 416

Cartilage, Ann. Biomed. Eng. 44 (2016) 3335–3345. doi:10.1007/s10439-016- 417

1659-6.

418

[18] J.K. Sarin, L. Rieppo, H. Brommer, I.O. Afara, S. Saarakkala, J. Töyräs, 419

Combination of optical coherence tomography and near infrared spectroscopy 420

enhances determination of articular cartilage composition and structure, Sci.

421

Rep. 7 (2017) 10586. doi:10.1038/s41598-017-10973-z.

422

[19] R.K. Korhonen, M.S. Laasanen, J. Töyräs, R. Lappalainen, H.J. Helminen, J.S.

423

Jurvelin, Fibril reinforced poroelastic model predicts specifically mechanical 424

behavior of normal, proteoglycan depleted and collagen degraded articular 425

cartilage, J. Biomech. 36 (2003) 1373–1379. doi:10.1016/S0021- 426

(21)

M AN US CR IP T

AC CE PT ED

19 9290(03)00069-1.

427

[20] S. Nakagawa, H. Schielzeth, A general and simple method for obtaining R 2 428

from generalized linear mixed-effects models, Methods Ecol. Evol. 4 (2013) 429

133–142. doi:10.1111/j.2041-210x.2012.00261.x.

430

[21] J. Ranstam, J.A. Cook, Considerations for the design, analysis and 431

presentation of in vivo studies, Osteoarthr. Cartil. 25 (2017) 364–368.

432

doi:10.1016/j.joca.2016.06.023.

433

[22] M. Conforti, A. Castrignanò, G. Robustelli, F. Scarciglia, M. Stelluti, G.

434

Buttafuoco, Laboratory-based Vis–NIR spectroscopy and partial least square 435

regression with spatially correlated errors for predicting spatial variation of soil 436

organic matter content, CATENA. 124 (2015) 60–67.

437

doi:10.1016/j.catena.2014.09.004.

438

[23] E. Andries, S. Martin, Sparse Methods in Spectroscopy: An Introduction, 439

Overview, and Perspective, Appl. Spectrosc. 67 (2013) 579–593.

440

doi:10.1366/13-07021.

441

[24] M. Prakash, J.K. Sarin, L. Rieppo, I.O. Afara, J. Töyräs, Optimal Regression 442

Method for Near-Infrared Spectroscopic Evaluation of Articular Cartilage, Appl.

443

Spectrosc. 71 (2017) 2253–2262. doi:10.1177/0003702817726766.

444

[25] R.. Korhonen, M. Wong, J. Arokoski, R. Lindgren, H.. Helminen, E.. Hunziker, 445

J.. Jurvelin, Importance of the superficial tissue layer for the indentation 446

stiffness of articular cartilage, Med. Eng. Phys. 24 (2002) 99–108.

447

doi:10.1016/S1350-4533(01)00123-0.

448

[26] A.S. Barros, D.N. Rutledge, Genetic algorithm applied to the selection of 449

(22)

M AN US CR IP T

AC CE PT ED

20 principal components, Chemom. Intell. Lab. Syst. 40 (1998) 65–81.

450

doi:10.1016/S0169-7439(98)00002-1.

451

[27] H. Goldstein, W. Browne, J. Rasbash, Multilevel modelling of medical data, 452

Stat. Med. 21 (2002) 3291–3315. doi:10.1002/sim.1264.

453

[28] D. Allard, J.-P. Chilès, P. Delfiner: Geostatistics: Modeling Spatial Uncertainty, 454

Math. Geosci. 45 (2013) 377–380. doi:10.1007/s11004-012-9429-y.

455

(23)

M AN US CR IP T

AC CE PT ED

21 456

Figure 1: Areas of interest (AI) marked (black squares) on the articulating surfaces 457

of equine metacarpophalangeal joint. In this study, grouping information is on two 458

dependency levels, i.e. joint level and bone level, which is held in Z (sample 459

count×5) and M (sample count×1) design matrices. Each AI (15mm × 15mm) has 25 460

equidistant measurement locations.

461 462 463

(24)

M AN US CR IP T

AC CE PT ED

22 464

Figure 2: Schematic chart of PCA-LME regression technique, which combines 465

principal component analysis (PCA) and linear mixed effects (LME) regression 466

technique. Preprocessed near infrared (NIR) spectra (X), tissue property (y), and 467

design matrices Z and M are the inputs. Design matrices Z and M are the mixed 468

effects in LME modeling. Model performance was evaluated using the root mean 469

square errors of cross validation (RMSECV) and prediction (RMSEP).

470 471 472

(25)

M AN US CR IP T

AC CE PT ED

23 Table 1: Summary of datasets

473 474

Measurements N Additional details

Equine cartilage dataset 5 joints, 44 AIs Surgical extraction NIR spectral

measurements 869

Absorbance spectroscopy (700-1050 nm)

Thickness (mm) 869 Optical coherence tomography

Equilibrium Modulus (MPa) 202 Indentation testing Dynamic modulus (MPa) 202 Indentation testing

PG content (AU) 530 FTIR microspectroscopy

Collagen content (AU) 530 FTIR microspectroscopy Collagen orientation (⁰) 530 Polarized light microscopy

(26)

M AN US CR IP T

AC CE PT ED

Prakash et al.

24 Table 2: The mean and range of different cartilage properties. A comparison of the assessment statistics of standard regression 475

techniques and introduced hybrid regression technique. The white rows represent PCR and PCA-LME, whereas grey rows 476

represent LASSO and LASSO-LME. ρ (Spearman’s rank correlation), root mean square errors of cross validation (RMSECV) and 477

prediction (RMSEP) are shown for all predicted parameters.

478

Property

Mean Standard regression Hybrid regression

(Range) Calibration Test Calibration Test

ρ RMSECV % ρ RMSEP % ρ RMSECV % ρ RMSEP %

Thickness 0.89 0.78 13.94 0.67 18.54 0.85 12.02 0.74 16.40

(mm) (0.32 – 1.81) 0.86 11.20 0.57 20.17 0.87 11.22 0.65 18.36

Dynamic 9.43

0.49 28.07 0.46 37.80 0.69 22.75 0.56 34.71

Modulus (0.24 – 23.3)

(MPa) 0.61 24.34 0.29 42.73 0.67 23.29 0.27 39.58

PG 6.31

0.45 18.17 0.34 22.42 0.52 17.50 0.42 22.54

content (0.60 – 14.71)

(AU) 0.51 17.10 0.34 22.26 0.53 17.52 0.37 21.89

Equilibrium 2.00

0.44 28.91 0.32 37.50 0.63 24.28 0.48 35.02

Modulus (0.03 – 5.38)

(MPa) 0.59 24.58 0.38 33.90 0.65 24.72 0.46 34.84

Collagen 33.35

0.41 19.79 0.35 23.34 0.43 19.21 0.27 24.90

Content (12.16 – 64.39)

(AU) 0.40 19.67 0.29 22.90 0.41 20.09 0.32 23.14

Collagen 71.12

0.31 22.45 0.27 25.01 0.37 21.79 0.23 25.35

Orientation (37.13 – 83.75)

angle ( ) 0.41 21.39 0.25 23.54 0.41 21.91 0.27 24.29

(27)

M AN US CR IP T

AC CE PT ED

25 Figure 3

479 480 481 482 483 484 485 486 487

Figure 3: Representative NIR spectra (700 to 1050 nm) for samples with different 488

cartilage (a) thickness (mm), (b) equilibrium modulus (MPa), (c) dynamic modulus 489

(MPa), (d) collagen content (AU), (e) proteoglycan content (AU), and (f) collagen 490

orientation angle (⁰). The top inset shows second derivative Savitzky-Golay 491

preprocessed spectra in 940 to 975 nm. Bottom inset shows the least absolute 492

shrinkage and selection operator (LASSO) based feature selection of the spectra.

493

(28)

M AN US CR IP T

AC CE PT ED

Prakash et al.

26 494

Figure 4: Predicted (x axis) vs. reference (y axis) values of cartilage thickness [mm, A] and equilibrium modulus [MPa, B].

495

Performance on calibration (blue, unfilled) and test set (red, filled) as predicted by PCR [A,(i)], PCA-LME[A,(ii)], LASSO[A,(iii)] and 496

LASSO-LME[A,(iv)] regression models. Effect of outliers on LASSO-LME [B] model performance when the test set range is within 497

[(i),(ii)] and outside [(iii),(iv)] the calibration range.

498

(29)

M AN US CR IP T

AC CE PT ED

1. A novel regression technique for multivariate spectroscopic data.

2. Adjacent spectroscopic measurements introduces spatial dependencies.

3. Spatial dependencies violates assumption of independence.

4. Hybrid regression accounts for spatial dependency during model calibration.

Viittaukset

LIITTYVÄT TIEDOSTOT

Highlights • Spectral preprocessing affects the performance of chemometric models • Selection of best preprocessing strategy depends heavily on the intended application • nippy

The combination of hybrid regression modelling and a spectral classifier en- abled the NIRS-based arthroscopic evaluation of the biomechanical properties of articular cartilage in

In light of these case studies, we evaluate the relevance of existing Finnish ecosystem service indicators, the data availability for ecosystem accounting in Finland, and the

We construct interdisciplinary “exchange ra- tes” for comparing publications across discipli- nes based on publication data from the leading peer-reviewed journals in

In this research, we rise to the challenge, and develop efficient algorithms for a commonly occurring search problem – searching for the statistically most significant dependency

In this thesis, spectroscopic techniques were combined with multivariate data analysis tools to gain molecular level understanding of the differences in the amorphous state caused

Input to hydrological models consists of meteorological data, discharge data and spatial data (slope, soils, land-use and vegetation cover) for model parameterization..

The URI consists of a network identifier, identifier type, dataset identifier of the spatial object and the local identifier of the geo- graphic feature.. In addition, the URI