Conclusions - Fast Convolutional Neural Networks for Real-Time Path Tracing Denoising

In this thesis first the theoretical backgrounds for Convolutional Neural Networks (CNN) and real-time path tracing were presented. After this 3 small CNNs were designed to minimize the inference time of the reconstruction of 1280x720 size path traced images.

Furthermore, the designed networks were evaluated with a state-of-the-art real-time path tracing denoiser, which was modified for multiple samples per pixel (spp), and with a larger slower neural network. Finally, directions for future work to the problem was presented.

Path tracing is a method to generate photorealistic images with physically based effects such as reflections, shadows, refractions and global illumination. Path tracing in real-time requires large amount of computational power. Even though path tracing is an ’embar-rassingly parallel’ problem it is often more sensible to use post-processing methods to im-prove the quality of the output than using the computational power to increase the number of samples. Moreover, the advancements in machine learning for image-based problems and the evolving inference hardware for neural networks enables the reconstruction of multiple samples per pixel in real-time.

The use of convolutional neural networks for image related problems have generated state-of-the-art results in for example classification and super-resolution. But generally, the architectures and implementations require high computational complexities to solve the problems. Recently, as new dedicated inference hardware has been developed for neural network, faster architectures and layers have been developed to reduce the com-putational complexity without sacrificing the accuracy of the models. These include for example depthwise convolutional layers. Also, post-processing methods for reducing the complexity of the network which include quantization and pruning. Moreover, for real-time path tracing reconstruction, the size of the networks must be limited. This further complicates the receptive field accumulation of neural networks as CNN usually rely on increasing size of the filters or depth of the network.

In this thesis 3 different fast networks were designed. The design process started by eval-uating different activation functions and different convolutional layers. For the path tracing denoising problem ReLU activation performed the best. The performance of depthwise separable convolutions were tested for shallow networks with different feature map sizes.

Nevertheless, the normal convolutions outperformed the depthwise separable

convolu-tions with small feature map sizes. The main difference for the 3 designed networks were the method which they increased the receptive field. The Simple Convolutional Neural Network (SCNN) only used a larger filter in the first layer to help with the receptive field.

Dilated Convolutional Neural Network (DCNN) uses the idea from Á Trous to increase the receptive field. Small UNet (SUNet) uses pooling methods to decrease the resolution of the image inside the network to filter the image in lower frequency. Also, the effect of prun-ing and quantization for the weights of the SCNN was tested. The SCNN model weights could be effectively pruned by setting 70% of the original weights to 0 and speeding up the loading process from 0.092 ms to 0.080 ms with no loss for the validation error.

These designed networks were evaluated in 3 models with a state-of-the-art real-time path tracing denoiser, modified for multiple spp inputs, and a larger UNet neural net-work. From the evaluation it can be seen that the fast CNNs are able to achieve better results denoising a 8 spp input than quadrupling the amount of samples. Moreover, the performance seems to scale well for even larger spp inputs. Compared to analytical bi-lateral filter and Á Trous filter the machine learning based methods performed better in most cases. One exception seems to be highly specular inputs where the machine learn-ing based methods had problems at least when considerlearn-ing theRoot-Mean-Square Error (RMSE). The real-time analysis showed that as the path tracing requires a lot of computa-tional power and a high level of parallelization so for real-time the use of post-processing denoising is necessary in practical implementations for achieving good quality. For ex-ample, denoising with a fast CNN yields better error metrics with 8 spp input in almost all cases compared to just path tracing with 64 spp even though the 64 spp path tracing requires almost 8x times the computational power.

The main limitation of the work is that no results with hardware acceleration for the tar-geted 16-bit floating point arithmetic’s for Tensor Cores were possible to measure for the evaluation. However, the performance benefits measured in other work, for the effective-ness of the hardware acceleration, applied to the models used in this thesis are a promis-ing result for real-time path tracpromis-ing denoispromis-ing. Furthermore, the models in this thesis can be extended to work with temporal denoising, neural bilateral grid and kernel-prediction CNNs.

REFERENCES

[1] Schied, C., Kaplanyan, A., Wyman, C., Patney, A., Chaitanya, C. R. A., Burgess, J., Liu, S., Dachsbacher, C., Lefohn, A. and Salvi, M. Spatiotemporal variance-guided filtering: real-time reconstruction for path-traced global illumination.Proceedings of High Performance Graphics. 2017, pp. 1–12.

[2] Kilgariff, E., Moreton, H., Stam, N. and Bell, B. NVIDIA Turing Architecture In-Depth. https : / / devblogs . nvidia . com / nvidia turing architecture -in-depth/accessed: 2021-03-23. 2018.

[3] Chaitanya, C. R. A., Kaplanyan, A. S., Schied, C., Salvi, M., Lefohn, A., Nowrouzezahrai, D. and Aila, T. Interactive reconstruction of Monte Carlo image sequences using a recurrent denoising autoencoder. ACM Transactions on Graphics (TOG) 36.4 (2017), pp. 1–12.

[4] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K. and Fei-Fei, L. Imagenet: A large-scale hierarchical image database.2009 IEEE conference on computer vision and pattern recognition. Ieee. 2009, pp. 248–255.

[5] LeCun, Y. A., Bottou, L., Orr, G. B. and Müller, K.-R. Efficient backprop. Neural networks: Tricks of the trade. Springer, 2012, pp. 9–48.

[6] Glorot, X., Bordes, A. and Bengio, Y. Deep sparse rectifier neural networks. Pro-ceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings. 2011, pp. 315–323.

[7] Cybenko, G. Approximation by superpositions of a sigmoidal function.Mathematics of control, signals and systems2.4 (1989), pp. 303–314.

[8] Ramachandran, P., Zoph, B. and Le, Q. V. Searching for activation functions.arXiv preprint arXiv:1710.05941(2017).

[9] Rumelhart, D. E., Hinton, G. E. and Williams, R. J. Learning representations by back-propagating errors.nature323.6088 (1986), pp. 533–536.

[10] Robbins, H. and Monro, S. A stochastic approximation method.The annals of math-ematical statistics(1951), pp. 400–407.

[11] Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980(2014).

[12] Bako, S., Vogels, T., McWilliams, B., Meyer, M., Novák, J., Harvill, A., Sen, P., Derose, T. and Rousselle, F. Kernel-predicting convolutional networks for denois-ing Monte Carlo renderdenois-ings.ACM Trans. Graph.36.4 (2017), pp. 97–1.

[13] Wang, Z., Bovik, A. C., Sheikh, H. R. and Simoncelli, E. P. Image quality assess-ment: from error visibility to structural similarity. IEEE transactions on image pro-cessing 13.4 (2004), pp. 600–612.

[14] Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z. et al. Photo-realistic single image super-resolution using a generative adversarial network. Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, pp. 4681–4690.

[15] LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W. and Jackel, L. D. Backpropagation applied to handwritten zip code recognition.Neural computation1.4 (1989), pp. 541–551.

[16] Fukushima, K. Neocognitron: A hierarchical neural network capable of visual pat-tern recognition.Neural networks1.2 (1988), pp. 119–130.

[17] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M. et al. Imagenet large scale visual recognition challenge.International journal of computer vision115.3 (2015), pp. 211–252.

[18] Ronneberger, O., Fischer, P. and Brox, T. U-net: Convolutional networks for biomed-ical image segmentation. International Conference on Medical image computing and computer-assisted intervention. Springer. 2015, pp. 234–241.

[19] Yu, F. and Koltun, V. Multi-scale context aggregation by dilated convolutions.arXiv preprint arXiv:1511.07122(2015).

[20] Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A. P., Bishop, R., Rueckert, D.

and Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 1874–1883.

[21] Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., An-dreetto, M. and Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications.arXiv preprint arXiv:1704.04861(2017).

[22] Ji, S., Xu, W., Yang, M. and Yu, K. 3D convolutional neural networks for human action recognition.IEEE transactions on pattern analysis and machine intelligence 35.1 (2012), pp. 221–231.

[23] Rippel, O. and Bourdev, L. Real-time adaptive image compression. International Conference on Machine Learning. PMLR. 2017, pp. 2922–2930.

[24] Krizhevsky, A., Sutskever, I. and Hinton, G. E. Imagenet classification with deep convolutional neural networks.Advances in neural information processing systems 25 (2012), pp. 1097–1105.

[25] Simonyan, K. and Zisserman, A. Very deep convolutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556(2014).

[26] Bengio, Y., Simard, P. and Frasconi, P. Learning long-term dependencies with gra-dient descent is difficult.IEEE transactions on neural networks5.2 (1994), pp. 157–

166.

[27] Glorot, X. and Bengio, Y. Understanding the difficulty of training deep feedforward neural networks.Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings. 2010, pp. 249–256.

[28] He, K., Zhang, X., Ren, S. and Sun, J.Deep residual learning for image recognition.

CoRR abs/1512.03385 (2015). 2015.

[29] Sifre, L. and Mallat, S. Rigid-motion scattering for texture classification.arXiv preprint arXiv:1403.1687 (2014).

[30] Dong, C., Loy, C. C., He, K. and Tang, X. Image super-resolution using deep convo-lutional networks.IEEE transactions on pattern analysis and machine intelligence 38.2 (2015), pp. 295–307.

[31] Huang, Y., Cheng, Y., Bapna, A., Firat, O., Chen, M. X., Chen, D., Lee, H., Ngiam, J., Le, Q. V., Wu, Y. et al. Gpipe: Efficient training of giant neural networks using pipeline parallelism.arXiv preprint arXiv:1811.06965(2018).

[32] Gharbi, M., Li, T.-M., Aittala, M., Lehtinen, J. and Durand, F. Sample-based Monte Carlo denoising using a kernel-splatting network. ACM Transactions on Graphics (TOG)38.4 (2019), pp. 1–12.

[33] Han, S., Mao, H. and Dally, W. J. Deep compression: Compressing deep neu-ral networks with pruning, trained quantization and huffman coding.arXiv preprint arXiv:1510.00149(2015).

[34] Hassibi, B. and Stork, D. G.Second order derivatives for network pruning: Optimal brain surgeon. Morgan Kaufmann, 1993.

[35] Han, S., Pool, J., Tran, J. and Dally, W. J. Learning both weights and connections for efficient neural networks.arXiv preprint arXiv:1506.02626(2015).

[36] Salvator, D.How Sparsity Adds Umph to AI Inference.https://blogs.nvidia.

com / blog / 2020 / 05 / 14 / sparsity - ai - inference/ accessed: 2021-05-26.

2020.

[37] Courbariaux, M., Bengio, Y. and David, J.-P. Training deep neural networks with low precision multiplications.arXiv preprint arXiv:1412.7024(2014).

[38] Gupta, S., Agrawal, A., Gopalakrishnan, K. and Narayanan, P. Deep learning with limited numerical precision.International conference on machine learning. PMLR.

2015, pp. 1737–1746.

[39] Banner, R., Nahshan, Y., Hoffer, E. and Soudry, D. Post-training 4-bit quantiza-tion of convoluquantiza-tion networks for rapid-deployment.arXiv preprint arXiv:1810.05723 (2018).

[40] Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H. and Kalenichenko, D. Quantization and training of neural networks for efficient integer-arithmetic-only inference.Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, pp. 2704–2713.

[41] Zwicker, M., Jarosz, W., Lehtinen, J., Moon, B., Ramamoorthi, R., Rousselle, F., Sen, P., Soler, C. and Yoon, S.-E. Recent advances in adaptive sampling and re-construction for Monte Carlo rendering.Computer graphics forum. Vol. 34. 2. Wiley Online Library. 2015, pp. 667–681.

[42] Whitted, T. An improved illumination model for shaded display.ACM Siggraph 2005 Courses. 2005, 4–es.

[43] Möller, T.Real-time rendering. eng. Fourth edition. Boca Raton ; CRC Press, Taylor Francis Group, 2018.ISBN: 0-429-22540-7.

[44] Kajiya, J. T. The rendering equation.Proceedings of the 13th annual conference on Computer graphics and interactive techniques. 1986, pp. 143–150.

[45] Pharr, M.Physically based rendering from theory to implementation.eng. 2nd ed.

Burlington, MA: Morgan Kaufmann/Elsevier, 2010.ISBN: 0-12-378580-4.

[46] Koskela, M., Immonen, K., Mäkitalo, M., Foi, A., Viitanen, T., Jääskeläinen, P., Kul-tala, H. and Takala, J. Blockwise multi-order feature regression for real-time path-tracing reconstruction.ACM Transactions on Graphics (TOG) 38.5 (2019), pp. 1–

14.

[47] Mara, M., McGuire, M., Bitterli, B. and Jarosz, W. An efficient denoising algo-rithm for global illumination.High Performance Graphics 10 (2017), pp. 3105762–

3105774.

[48] Meng, X., Zheng, Q., Varshney, A., Singh, G. and Zwicker, M. Real-time Monte Carlo Denoising with the Neural Bilateral Grid. (2020).

[49] Karis, B. High-quality temporal supersampling.Advances in Real-Time Rendering in Games, SIGGRAPH Courses1.10.1145 (2014), pp. 2614028–2615455.

[50] Schied, C., Peters, C. and Dachsbacher, C. Gradient estimation for real-time adap-tive temporal filtering.Proceedings of the ACM on Computer Graphics and Interac-tive Techniques1.2 (2018), pp. 1–16.

[51] Tomasi, C. and Manduchi, R. Bilateral filtering for gray and color images. Sixth international conference on computer vision (IEEE Cat. No. 98CH36271). IEEE.

1998, pp. 839–846.

[52] Burt, P. J. Fast filter transform for image processing.Computer graphics and image processing16.1 (1981), pp. 20–51.

[53] Vogels, T., Rousselle, F., McWilliams, B., Röthlin, G., Harvill, A., Adler, D., Meyer, M. and Novák, J. Denoising with kernel prediction and asymmetric loss functions.

ACM Transactions on Graphics (TOG)37.4 (2018), pp. 1–15.

[54] Eisemann, E. and Durand, F. Flash photography enhancement via intrinsic relight-ing.ACM transactions on graphics (TOG)23.3 (2004), pp. 673–678.

[55] Petschnigg, G., Szeliski, R., Agrawala, M., Cohen, M., Hoppe, H. and Toyama, K. Digital photography with flash and no-flash image pairs. ACM transactions on graphics (TOG)23.3 (2004), pp. 664–672.

[56] Dammertz, H., Sewtz, D., Hanika, J. and Lensch, H. P. Edge-avoiding a-trous wavelet transform for fast global illumination filtering. Proceedings of the Conference on High Performance Graphics. Citeseer. 2010, pp. 67–75.

[57] Chen, J., Paris, S. and Durand, F. Real-time edge-aware image processing with the bilateral grid.ACM Transactions on Graphics (TOG)26.3 (2007), 103–es.

[58] Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M. et al.TensorFlow: Large-scale machine learning on heterogeneous systems. 2015.

[59] He, K., Zhang, X., Ren, S. and Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.Proceedings of the IEEE in-ternational conference on computer vision. 2015, pp. 1026–1034.

[60] LeCun, Y., Bottou, L., Bengio, Y. and Haffner, P. Gradient-based learning applied to document recognition.Proceedings of the IEEE 86.11 (1998), pp. 2278–2324.

[61] Lotvonen, A., Koskela, M. and Jääskeläinen, P. Machine Learning Is the Solution Also for Foveated Path Tracing Reconstruction. (2020).

[62] Markidis, S., Der Chien, S. W., Laure, E., Peng, I. B. and Vetter, J. S. Nvidia ten-sor core programmability, performance & precision. 2018 IEEE International Par-allel and Distributed Processing Symposium Workshops (IPDPSW). IEEE. 2018, pp. 522–531.

In document Fast Convolutional Neural Networks for Real-Time Path Tracing Denoising (sivua 56-62)