Discussion on results - Convolutional neural networks in osteoporotic fracture risk prediction

A series of experiments were conducted to form an understanding about the feasibility of convolutional neural networks to predict osteoporotic fractures from spine DXA images. The results show that CNN can learn patterns that predict osteoporotic frac-tures with diagnostic ability comparable to the current gold standard of BMD or the alternative method TBS. In our experiments the CNN prediction results outperformed the baseline by AUC 0.733 against BMD’s AUC 0.653 and TBS’s AUC 0.691. The best results of all the models are summarized in Table 5.10.

Table 5.10. Summary of the experiment results.

Model AUC

VGG19 0.733

VGG16 0.720

TBS 0.691

Deep custom model 0.687 Moderate custom model 0.681 Perceptron model 0.661 Shallow custom model 0.657

BMD 0.653

Both the perceptron model and the shallow custom model produced AUC very close to the BMD predictor. Therefore, it is assumable that the simpler models mostly just adapt to the input pixel intensities. The visualizations support this hypothesis by show-ing very fine-grained backpropagation maps and low contrast heatmaps.

The moderate and deep custom models seemed to learn something more, as they

out-patterns that had prediction ability for osteoporotic fractures. The visualizations show much clearer backpropagation maps and better focused heatmaps. The visualizations resemble the ones produced by the experiments predicting low TBS values, which supports the hypothesis of added textural analysis. TBS predicted osteoporotic frac-tures with slightly better AUC but very close to the deep model.

The pretrained models produced the best results in this study. They also converged faster than the custom models. The depth of fine-tuning did not have major effects on the results. The hypothesis is that their pretrained feature extraction capabilities were able to discriminate the classes with very little training, while further training quickly led to overfitting with a dataset significantly smaller than the pretraining set.

The visualization method proved to be helpful in interpreting the predictions, although it produced different results than expected. The expectation was that the models would learn to identify small signs of fracture risk. Instead, with custom models they learned wider patterns that possibly predict resistance to fractures. This can of course be seen as the other side of the same thing.

With pretrained models, the visualization results were difficult to interpret. Both the backpropagation maps and the heatmap varied too much to provide good transparency.

One possible explanation is that because the pretrained models have been trained with natural images very different from this dataset, they have learned to identify features that are not apparent in DXA images. Although these features may be useful for dis-criminating our classes, they may not appear visually so meaningful for human eye.

The results of low TBS prediction did not quite meet the expectations. The hypothesis was that CNN models would be well equipped to learn deterministic textural analysis methods. The best AUC of 0.820 was better than the prediction of osteoporotic frac-tures, but the problem should also be much easier to solve. In TBS prediction, all the information needed for the output calculation should be present in the input data. How-ever, this was not the main focus of this study, and not as much optimization effort was put into predicting TBS values.

small validation and test sets to leave enough samples for training. This compromises the credibility of the results to some extent. The variance in different randomly picked set distributions was quite high. Therefore, the reported prediction accuracies and AUC values have to be taken as indicative results. Conclusions should be drawn more about the magnitude relative to baseline rather than the precise value of the results.

The moderate level of the fracture prediction results was somewhat expected. The risk of osteoporotic fractures involves factors that most likely cannot be seen in DXA im-ages. For example, the risk of falling has major effect on the overall fracture risk (Patel

& al., 2005). Risk of falling in turn has factors such as muscle strength, eye sight and use of medications (WHO, 2003). On the other hand, skeleton’s resistance to fractures is largely based on bone strength, which can be predicted by many features visible in DXA images. Some of these features are known by medical experts, but some are most likely too difficult for human eye to detect. These are the kind of signs that could be revealed by statistical models like neural networks. It is difficult to estimate the highest possible prediction ability achievable by machine learning.

The results from our experiments consistently showed that increasing the depth of the network enables the model to learn more complex representations and improve the prediction. However, deeper models also require more input data since they are prone to overfitting. Data augmentation and different regularization techniques can mitigate this problem to some extent, but they do not remove the need for enough real-world samples. It is possible that with big enough datasets, the deep models could learn more detailed features and specific signs of fracture risk. With proper visualization methods this could even lead to new discoveries in the field of osteoporosis diagnosis.

6 Conclusions

This study presents introduction to artificial neural networks and deep learning that form the basis for convolutional neural network approach. Design possibilities in CNN architecture and the role of different hyperparameters is discussed. This theoretical background is put into test in the context of osteoporosis diagnosis. Predicting osteo-porotic fractures from spine DXA images is a difficult pattern recognition task that involves prediction of future events. Feasibility of CNN for this task was assessed experimentally by using two different approaches: training from scratch and transfer learning.

The experiments show that convolutional neural networks can be used with DXA im-age data to produce predictive ability different from the standard clinical methods cur-rently in use. However, due to limited amount of data and wide range of architectural possibilities, their true potential in this context remains to be unfolded. It is also likely that the DXA image analysis alone cannot provide the best fracture risk prediction, regardless of the model. It could contribute to more comprehensive prediction method that utilizes a wider set of input data to cover different aspects of the risk.

Using pretrained CNNs produced the best fracture prediction results in the experi-ments. The results support the advertised benefits of transfer learning, such as faster training speed and suitability for smaller datasets. On the other hand, the visualizations from the pretrained models were too confusing to draw conclusions. The custom mod-els produced less impressive results, but through clearer visualizations, they provided better transparency. This makes them very potential for further research.

Future study in this area should seek to exploit larger datasets and alternative image types. Hip images could be used together with spine images to gain better predictive ability. These images are usually taken in the same patient visits, so models based on the combination of these images would be also usable in practice. Another interesting aspect would be to add a longitudinal aspect to the input images. BMD values are often analyzed as trajectories over time, so similar approach could bring value to deep

learn-References

Avinash, S. V. (2017). Understanding Activation Functions in Neural Networks.

https://medium.com/the-theory-of-everything/understanding-activation-functions-in-neural-networks-9491262884e0 (retrieved 28.11.2018).

Azizpour, H., Razavian, A.S., Sullivan, J., Maki, A., & Carlsson, S. (2015) From ge-neric to specific deep representations for visual recognition. 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 36-45.

Bengio, Y. (2009) Learning Deep Architectures for AI. Foundations and Trends in Machine Learning 2(1), 1-55.

Bengio, Y. (2012). Practical Recommendations for Gradient-Based Training of Deep Architectures. In Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science 7700, 437-478.

Bengio, Y., Courville, A., Vincent, P. (2013) Representation Learning: A Review and New Perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828.

Bottou, L., Curtis, F.E., Nocedal, J. (2018) Optimization Methods for Large-Scale Ma-chine Learning. Society for Industrial and Applied Mathematics Review 60(2), 223-311.

Boureau, Y.-L., Ponce, J., Lecun, Y. (2010) A Theoretical Analysis of Feature Pooling in Visual Recognition. In Proceedings of the 27th International Conference on Ma-chine Learning, Omnipress, USA, 111-118.

Bousson, V., Bergot, C., Sutter, B., Thomas, T., Bendavid, S., Benhamou, C.-L., Blain, H., Brazier, M., Breuil, V., Briot, K., Chapurlat, R., Chapuis, L., Cohen-Solal, M., Fardellone, P., Féron, J.-M., Gauvain, J.-B., Laroche, M., Legrand, E., Lespessailles, E., Cortet, B. (2015). Trabecular Bone Score: Where are we now? Joint Bone Spine 82(5), 320-325.

Buckwalter, J. A., Glimcher, M. J., Cooper, R. R., Recker, R. R. (1995) Bone biology.

C. Manolagas, S. (2000) Birth and Death of Bone Cells: Basic Regulatory Mechanisms and Implications for the Pathogenesis and Treatment of Osteoporosis. Endocrine re-views 21(2), 115-137.

Cabitza, F., Locoro, A., Banfi, G. (2018) Machine Learning in Orthopedics: A Litera-ture Review. Frontiers in Bioengineering and Biotechnology 6(75), 1-20.

Cauchy, A. (1847). Méthode générale pour la résolution de systèmes d’équations sim-ultanées. In Compte rendu des séances de l’académie des sciences, 315-323.

Cernazanu-Glavan, C., Holban, S. (2013) Segmentation of Bone Structure in X-ray Images using Convolutional Neural Network. Advances in Electrical and Computer Engineering 13(1), 87-94.

Claesen, M., De Moor, B. (2015) Hyperparameter Search in Machine Learning. The XI Metaheuristics International Conference.

Cooper, C., Campion, G., Melton, L.J. (1992) Hip fractures in the elderly: a world-wide projection. Osteoporos International 2(6), 285-289

Cosman, F., de Beur, S. J., & Leboff, M., Lewiecki, E., Tanner, B., Randall, S., Lind-say, R. (2008) The Clinician's Guide to Prevention and Treatment of Osteoporosis.

Osteoporosis International 25(10), 2359-2381.

Cybenko, G. (1989) Approximation by superpositions of a sigmoidal function. Math-ematics of Control, Signals and Systems 2(4), 303-314.

Deng, L., Yu, D. (2014) Deep Learning: Methods and Applications. Now Publishers inc., USA.

Duchi, J. C., Hazan, E., Singer, Y. (2011) Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research 12(1), 2121-2159.

El Maghraoui, A. (2012). Dual Energy X-Ray Absorptiometry. InTech, Croatia.

Eldan, R., Shamir, O. (2016) The Power of Depth for Feedforward Neural Networks.

In JMLR: Workshop and Conference Proceedings 49, 1–34.

Ensrud, K.E., Ewing, S.K., Cawthon, P.M., Fink, H.A., Taylor, B.C., Cauley, J.A.,

frailty indexes for the prediction of falls, disability, fractures, and mortality in older men. Journal of the American Geriatrics Society 57(3), 492-498.

Erhan, D., Manzagol, P., Bengio, Y., Bengio, S., Vincent, P. (2009) The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training. Twelfth In-ternational Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Procedings, 153-160.

Fawcett, T. (2006) An introduction to ROC analysis. Pattern Recognition Letters 27(8), 861-874.

Glorot, X., Bordes, A., Bengio, Y. (2011). Deep Sparse Rectifier Neural Networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, 315-323.

González-Macías, J., Marín, F.A., Vila, J.M., Díez-Pérez, A. (2012). Probability of fractures predicted by FRAX® and observed incidence in the Spanish ECOSAP Study cohort. Bone 50(1), 373-377.

Goodfellow, I., Bengio, Y., Courville, A. (2016) Deep Learning. MIT Press, USA.

Guglielmi, G., Scalzo, G. (2010) Diagnostic imaging: Imaging tools transform diag-nosis of osteoporosis. http://www.diagnosticimaging.com/article/imaging-tools-trans-form-diagnosis-osteoporosis (retrieved 27.11.2018).

Hagan, M. T., Demuth, H. B., Beale, M. H., De Jesús, O. (2014) Neural network design 2^nd Edition. Martin Hagan, USA.

Hans D., Dargent-Molina, P., Schott, AM., Sebert, J. L., Cormier, C., Kotzki, P. O., Delmas, P. D., Pouilles, J. M., Breart, G., Meunier, P. J. (1996). Ultrasonographic heel measurements to predict hip fracture in elderly women: the EPIDOS prospective study.

Lancet 348(9026), 511-514.

He, K., Zhang, X., Ren, S., & Sun, J. (2016) Deep Residual Learning for Image Recog-nition. 2016 IEEE Conference on Computer Vision and Pattern Recognition, 770-778.

Hinton, G. E., McClelland, J. L., Rumelhart, D. E. (1986) Distributed representations.

Parallel distributed processing: explorations in the microstructure of cognition 1,

77-Hubel, D.H., Wiesel, T.N. (1959) Receptive fields of single neurones in the cat's striate cortex. The Journal of physiology 148(3), 574-591.

Johnell, O., Hertzman, P. (2006) What evidence is there for the prevention and screening of osteoporosis? WHO Health Evidence Network Report, World Health Organization, Copenhagen.

Johnell, O., Kanis, J. A. (2006) An estimate of the worldwide prevalence and disability associated with osteoporotic fractures. Osteoporosis International 17(12), 1726-1733.

Josse, R., Khan, A., Ngui, D., Shapiro, M. (2013) Denosumab, a new pharmacotherapy option for postmenopausal osteoporosis. Current medical research and opinion 29(3), 205-216.

Kanis, J. A., Johnell, O., Oden, A., Johansson, H., McCloskey, E. (2008) FRAX and the assessment of fracture probability in men and women from the UK. Osteoporosis International 19(4), 385-397.

Karjalainen, J.P., Riekkinen, O., Töyräs, J., Jurvelin, J.S., Kröger, H. (2015). New method for point-of-care osteoporosis screening and diagnostics. Osteoporosis Inter-national 27(3), 971-977.

Lecun, Y. (1989). Generalization and network design strategies. Connectionism in per-spective, 143-155.

LeCun, Y., & Bengio, Y. (1995) Convolutional Networks for Images Speech and Time Series. The handbook of brain theory and neural networks, 255-258.

LeCun, Y., Bengio, Y., Hinton, G. E. (2015) Deep Learning. Nature 521(7553), 436-444.

Leslie, W.D., Lix, L.M. (2013) Comparison between various fracture risk assessment tools. Osteoporosis International 25(1), 1-21.

Malowney, M. (2013). Dr. Mark Malowney Chiropractic. http://drmchiro.blog-spot.com/ (retrieved 14.1.2019).

Maquer, G., Lu, Y., Dall'Ara, E., Chevalier, Y., Krause, M., Yang, L., Eastell, R., Lippuner, K., Zysset, P. (2015) The Initial Slope of the Variogram, Foundation of the

Messina, C., Sconfienza, L. M., Bandirali, M., Guglielmi, G., Ulivieri, F. M. (2016) Adult Dual-Energy X-ray Absorptiometry in Clinical Practice: How I Report it. Sem-inars in Musculoskeletal Radiology 20(3), 246-253.

Nair, V., Hinton, G. E. (2010). Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning, Omnipress, USA, 807-814.

Narkhede, S. (2018) Understanding AUC - ROC Curve. https://towardsdatasci-ence.com/understanding-auc-roc-curve-68b2303cc9c5 (retrieved 27.11.2018).

National Osteoporosis Society (NOS) (2018) Physical Principles and Measurement Accuracy of Bone Densitometry. https://nos.org.uk/media/1725/02_physical_princi-ples.pdf (retrieved 27.11.2018).

Nielsen, M. A. (2015) Neural Networks and Deep Learning. Determination Press.

OSTPRE (2018) Kuopio Osteoporosis Risk Factor and Prevention Study.

http://www.uef.fi/en/web/kmru/ostpre (retrieved 12.8.2018)

Patel, S., Pingel, J. (2017) Introduction to Deep Learning: What Are Convolutional Neural Networks? https://se.mathworks.com/videos/introduction-to-deep-learning-what-are-convolutional-neural-networks--1489512765771.html (retrieved 17.3.2019).

Patel, S., Tweed, K., Chinappen, U. (2005) Fall-related risk factors and osteoporosis in older women referred to an open access bone densitometry service. Age and Ageing 34(1), 67–71.

Pothuaud, L., Carceller, P., Hans, D. (2008). Correlations between grey-level varia-tions in 2D projection images (TBS) and 3D microarchitecture: Applicavaria-tions in the study of human trabecular bone microarchitecture. Bone 42(4), 775-787.

Raschka, S. (2015) Single-Layer Neural Networks and Gradient Descent. https://se-bastianraschka.com/Articles/2015_singlelayer_neurons.html (retrieved 14.1.2019)

Rho, J.-Y., Kuhn-Spearing, L., Zioupos, P. (1998) Mechanical properties and the

hi-Rosenblatt, F. (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review 65(6), 386-408.

Ruder, S. (2016) An overview of gradient descent optimization algorithms. CoRR.

Rumelhart, D. E., Hinton, G. E., Williams, R. J. (1986) Learning Representations by Back Propagating Errors. Nature 323(6088), 533-536.

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpa-thy, A., Khosla, A., Bernstein, M.S., Berg, A.C., Fei-Fei, L. (2015) ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115(3), 211-252.

Seeman, E. (2008) Bone quality: The material and structural basis of bone strength.

Journal of Bone and Mineral Metabolism 26(1), 1-8.

Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D. (2017) Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localiza-tion. 2017 IEEE International Conference on Computer Vision, 618-626.

Simard, P.Y., Steinkraus, D., Platt, J.C. (2003) Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis. Seventh International Conference on Document Analysis and Recognition 2(1), 958.

Simonyan, K., Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of ICLR, 2015.

Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.A. (2014) Striving for Simplicity: The All Convolutional Net. CoRR.

Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R. (2014) Dropout: a simple way to prevent neural networks from overfitting. Journal of Ma-chine Learning Research 15(1), 1929-1958.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z. (2016) Rethinking the Inception Architecture for Computer Vision. 2016 IEEE Conference on Computer Vi-sion and Pattern Recognition, 2818-2826.

Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B.,

Tiulpin, A., Thevenot, J., Rahtu, E., Lehenkari, P., Saarakkala, S. (2018) Automatic Knee Osteoarthritis Diagnosis from Plain Radiographs: A Deep Learning-Based Ap-proach. Scientific Reports 8(1), 1-10.

Unnanuntana, A., Gladnick, B. P., Donnelly, E., Lane, J. M. (2010) The assessment of fracture risk. Journal of Bone and Joint Surgery, 92(3), 743-753.

van Gerven, M., Bohte, S. (2018) Artificial Neural Networks as Models of Neural In-formation Processing. https://www.frontiersin.org/research-topics/4817/artificial-neural-networks-as-models-of-neural-information-processing (retrieved 20.11.2018).

Väänänen, S. (2014) Functional imaging of proximal femur by combining dual energy X-ray imaging and finite element simulations. Publications of the University of Eastern Finland, Kuopio.

Wanderman, N., Freedman, B., Currier, B., Huddleston, P., Yaszemski, M. J., Nassr, A. (2017) Interpreting the DXA analysis: When should you hold off on spinal fusion?

Seminars in Spine Surgery. 30(1), 36-40.

WHO (1994) Assessment of fracture risk and its application to screening for postmen-opausal osteoporosis. Report of a WHO Study Group. WHO Technical Report Series 843, World Health Organization, Geneva.

WHO (2003) Prevention and management of osteoporosis. Report of a WHO Scientific Group. WHO Technical Report Series 921, World Health Organization, Geneva.

Wilson, D.R., Martinez, T.R. (2003) The general inefficiency of batch training for gra-dient descent learning. Neural networks: the official journal of the International Neu-ral Network Society 16(10), 1429-51.

Wu, D., Isaksson P., Ferguson S. J., Persson, C. (2018) Young's modulus of trabecular bone at the tissue level: A review. Acta Biomaterialia, 78, 1-1.

In document Convolutional neural networks in osteoporotic fracture risk prediction using spine DXA images (sivua 60-71)