• Ei tuloksia

3.3 Spatial and temporal deep learning architectures

3.3.5 Three-dimensional CNN

As initially reported by[86], 3D CNNs were shown to perform remarkably well in modelling tasks involving spatio-temporal data. As CNNs, the 3D CNNs utilize all the same architectural features as the more commonly used convolutional models.

What is different is their use of convolution in the depth dimension, searching for robust features across sequences of input data in addition to spatial features extracted from the individual images. The sequential nature of input data is not limited to time, but can also be, for example, hyper-spectral multi-layer point-in-time data with the

aim of finding high-level inter-channel features[50].

Barbosa et al. have developed and trained multiple models to predict crop yield from spatially formatted crop management data[2]. The outline of the study is elaborated in Section 3.3.1. One of the models was based on 3D CNN architecture, consisting of a single 3D convolutional layer coupled with two FC layers. They utilized the 3D CNN to extract inter-channel high-level features from the inputs.

The 3D CNN model was observed to perform on par with the majority of other CNN-based solutions, attaining 0.73 RMSE, which translates to 1190 kg/ha when unscaled.

Terliksiz and Altylar have studied the use of a 3D CNN architecture to predict soybean yields from spatio-temporal data at county scale in the US[84]. The in-put data was acquired annually from 2003 to 2016 from the spatially low-resolution MODIS satellites at eight-day intervals between each data sample. A single sequence consisted of 24 subframes cropped from the initial, larger area. The model imple-mented by the authors consists of two initial 2D CNN layers, followed by six 3D convolutions and two FC layers for single value prediction. Their model attained an average 4.42 bu/ac RMSE with various land cover ratio ablations in the input frames.

Compared to other similar studies, their result is, at minimum, 0.90 bu/ac RMSE lower. The authors discuss, however, that the use of within-county smaller frames can be misleading when predicting a county-wide average yield.

In their study of crop type classification in Germany, South Sudan and Ghana, Rustowicz et al. have also built and trained a 3D CNN encoder-decoder architec-ture to classify pixels in sequences of frames extracted from satellite data[65]. Their study is described in Section 3.3.3. The 3D CNN model performed close to the CNN-LSTM model, attaining 95.2%, 60.9% and 85.3% overall accuracies with the Germany, Ghana and South Sudan data sets, respectively. While CNN-LSTM per-formed better with Germany by 0.6% units, the 3D CNN had better overall accu-racy with Ghana and South Sudan by 1.0% units and 2.7% units, respectively. With ablation studies, the best setting with 3D CNN outperformed the best CNN-LSTM setting by 1.3% units with the South Sudan data set in terms of accuracy, while the CNN-LSTM attained a 2.2% unit higher accuracy with the Ghana data set.

Ji et al. have developed a 3D CNN model to perform crop type classification with spatio-temporal data[27]. They used data from 4 m/px resolution Gaofen 1 and 15 m/px Gaofen 2 satellite systems, acquired for several months from 2014 to 2016. The

model consisted of three 3D convolution layers followed by two FC layers. The 3D CNN outperformed other methods in terms of pixel-wise classification, including a CNN and several traditional machine learning methods. The average overall ac-curacy of the model was 94.9%. The two closest contender models, a CNN and a support vector machine (SVM), attained average overall accuracies of 91.8% and 91.9%, respectively.

Li, Zhang and Shen have studied the use of 3D CNN models in spectral-spatial land cover classification with hyper-spectral data[50]. The authors used three hyper-spectral data sets from Italy, Botswana and India. A two-layer 3D CNN architecture was built and trained to perform a classification of remotely sensed scenes pixel-wise.

The model was compared against models developed in other studies and similar set-tings, including a stacked auto-encoder, a deep belief network and a CNN. The 3D CNN attained an overall accuracy of 99.3%. With notable differences (>2 % units) in prediction accuracies only observed with the Indian data set, the authors noticed that 3D CNN had the lowest misclassification ratio out of the compared models. While spatial models were observed to perform the best overall, the authors attributed the best performance of the 3D CNN to its ability to learn salient inter-spectral features from the input data.

3.4 Conclusions

The use of deep learning techniques in agriculture and agriculture-related remote sensing applications has notably increased recently. Spatial, temporal and spatio-temporal deep learning architectures have been utilized in a variety of tasks. Of these modelling tasks the within-season intra-field crop yield prediction is of greatest interest regarding this dissertation.

When successful, crop yield prediction performed in a timely manner enables the farmers to initiate data-based corrective actions on their fields. This is why de-ducing actionable items for each field a farmer manages requires information about the field plots at the intra-field scale. Deep learning architectures, high-resolution remote sensing data and crop yield prediction are common and recurring themes in current agriculture-related machine learning studies. The novelty of the studies se-lected for this dissertation is in how these themes are combined to achieve spatially and temporally context aware within-season intra-field crop yield prediction, which

has not yet been studied extensively.

4 CROP YIELD PREDICTION WITH DEEP LEARNING

Having established the solid background of crop yield prediction with deep learning models, it is time to review the contributions of the publications selected for this dissertation. The objective of this dissertation is to probe and answer the research questions outlined in Chapter 1. RQ1 and RQ2 provide a suitable division for the selected publications. The feasibility of spatial and spatio-temporal deep learning models for in-season yield prediction with high-resolution remote sensing data is ex-plicitly studied in Publications[I]and[IV]. In both, the models are designed, trained and evaluated from scratch to perform intra-field crop yield prediction with UAV-based data, within the growing season. Furthermore,[II]uses the model of[I]in a case study to frame the use of such models in a farming DSS. Two of the remaining publications,[III]and[V], focus on data sources. In[V], the effects of additional data sources were evaluated by comparing crop yield estimation performance with a static model architecture from[I]. Publication[III], while taking a different approach in terms of the modelling technique used, focuses on the reliability of satellite-based remote sensing data.

This chapter focuses on the data, methods and results of the selected publications, leaving discussion of the results and methods to Chapter 5. The chapter consists of two main sections and is constructed as follows. In the first section, the develop-ments and evaluations of intra-field crop yield prediction models are described. The section begins by looking at CNN-based yield prediction with distinct point-in-time frames as presented in[I]. A frame is a sub-area extracted from larger images. Next, the evaluation of the usability of the best model of[I]in the farming DSS context is presented in[II]. Lastly, the development and evaluation of multiple spatio-temporal deep learning models in[IV]to perform crop yield prediction using UAV-based data are described. The second section focuses on data evaluation. The general outline

of[V] is first described, where the effects of varying the input data source config-urations on crop yield prediction performance were studied. The process of using machine learning to evaluate satellite data reliability with regards to cloud canopy [III]is described last.