Estimating the performance of a multiradar tracker using machine learning

(1)

ESTIMATING THE PERFORMANCE OF A MULTIRADAR TRACKER USING

MACHINE LEARNING

Master’s thesis Faculty of Information Technology and Communication Sciences Examiner: D.Sc.(Tech) Pasi Pertilä April 2021

(2)

ABSTRACT

Harri Tolkkinen: Estimating the performance of a multiradar tracker using machine learning

Master’s thesis Tampere University Computing Sciences April 2021

Multiradar tracking of aircraft is a sensor fusion problem including complicated measurement and object motion models. The accuracy of radar measurements differs significantly between bearing and range. The transformation of the measurements produced by individual radars between the local and a common reference frame include multiple calibration and bias compensation steps that introduce systematic errors to the measurements. Most of the common tracking algorithms require linearization of the possibly nonlinear measurement and motion models, introducing an additional error source. Understanding the quality of information provided by a system is es- sential for situational awareness and decision making.

In this thesis, the altitude component of the location is approached with data-driven machine learning methods. Deep learning and Support Vector Machine (SVM) methods are proposed for providing a temporal quality estimate of the tracker altitude output. Furthermore, the deep learning method is refined to predict correction terms for the tracker altitude output, essentially improving the tracking accuracy.

The developed methods were trained and tested with data from an experimental multiradar tracking system. Although the performance of the methods is most likely highly specific to the system and parametrization, the results show that especially the deep learning method provided a fairly accurate error estimate and that the correction terms increased the altitude accuracy significantly. Obtained results show that machine learning methods can be useful in radar tracking even with a relatively small amount of training data.

Keywords: machine learning, recurrent neural networks, LSTM, radar, tracking

The originality of this thesis has been checked using the Turnitin OriginalityCheck service.

(3)

TIIVISTELMÄ

Harri Tolkkinen: Monitutkaseurannan suorituskyvyn estimointi koneoppimismenetelmillä Diplomityö

Tampereen yliopisto Tietotekniikka Huhtikuu 2021

Lentävien ilma-alusten monitutkaseuranta on monimutkaisia sensori- ja kinematiikkamalle- ja sisältävä sensorifuusio-ongelma. Tutkamittauksien tarkkuudessa on merkittäviä eroja syvyys- ja kulmasuunnissa. Yksittäisen tutkan paikallisessa koordinaatistossaan suorittaman mittauksen siirtäminen yhteiseen maailmakoordinaatistoon vaatii useita erilaisia kalibrointiaskelia, aiheuttaen mittauksiin systemaattista virhettä. Useimmat tyypilliset seuranta-algoritmit vaativat mahdollisesti epälineaaristen mittaus- ja liikemallien linearisointia, mikä aiheuttaa myös mahdollisen virheläh- teen. Seurantajärjestelmän tuottaman tiedon laadun ymmärtäminen on ensiarvoisen tärkää tilan- netietoisuuden ja päätöksenteon kannalta.

Työssä tarkastellaan seurannan korkeuskomponenttia datalähtöisillä koneoppimismenetelmil- lä. Työssä esitellään sekä syväoppiva- että tukivektorimallit, jotka antavat tilanneriippuvaista en- nustetta seurantajärjestelmän korkeustiedon laadusta. Lisäksi syväoppivaa mallia käytetään myös korkeustiedon tarkkuuden parantamiseen.

Työssä kehitetyt menetelmät opetettiin ja testattiin kokeellisen monitutkaseurantajärjestelmän tuottamalla datalla. Vaikka menetelmän suorituskyky on luultavasti vahvasti järjestelmäsidonnais- ta, tulokset osoittavat, että varsinkin syväoppiva menetelmä tuotti verrattain hyvää virhe-estimaattia ja sen tuottamat korjaustermit paransivat myös korkeustiedon tarkkuutta huomionarvoisesti. Saa- dut tulokset osoittavat, että koneoppimismenetelmät voivat olla hyödyllisiä tutkaseurantasovelluk- sissa myös suhteellisen pienillä määrillä opetusdataa.

Avainsanat: koneoppiminen, takaisinkytketty neuroverkko, tutka, tutkaseuranta Tämän julkaisun alkuperäisyys on tarkastettu Turnitin OriginalityCheck -ohjelmalla.

(4)

PREFACE

The topic and problems investigated in this thesis were chosen and refined together with the Patria Emerging Technology Research(ETR) group. This thesis work, at the begin- ning, had relatively loose definition just to experiment with relatively recent machine learning methods and the multiradar tracker dataset. The original idea was just to create error level estimate, but it was interesting to realize that the deep learning model was able to predict correction terms for the most recent track updates.

I would like to thank ETR-researchers Juha Jylhä, Minna Väilä and Marja Ruotsalainen for their effort on exchanging ideas with me and the feedback during the reporting phase of this thesis. I was very pleased of the freedom that was given for me to use the models that I was interested of and focus my background research on topics that I was interested of. This thesis work was interesting mixture of mature radar signal processing theory and more recent data-driven machine learning.

Tampereella, 22nd April 2021

Harri Tolkkinen

(5)

LIST OF FIGURES

2.1 A block diagram of radar receiver signal processing chain. . . 4 2.2 Block diagram of a receiver filter bank. FiltersM F are tuned on different

Doppler velocities. . . 5 2.3 Pulse-Doppler radar range and elevation measurements. Antenna type

and target in the figure are drawn for meterological applications as the figure is derived from [.] However, same principles apply for other pulse- Doppler type radars used in aviation. . . 7 2.4 Monopulse lobes. Summation lobeΣis achieved by adding the received

signals from both halves. Correspondingly, difference channel can be achieved by substracting the signals from both halves. . . 8 2.5 S/N calculation example recreated after [1, p. 66]. . . 9 2.6 Airbus A320 passenger aircraft simulated RCS values drawn on a sphere

as a function of the viewing angle. . . 9 2.7 Block diagram of the tracking process.[8] . . . 11 2.8 Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF) opera-

tion cycle. . . 12 2.9 Simplified example of the multihypothesis principle. . . 13 3.1 Supervised and unsupervised learning. Figure first published in Der Radi-

ologe under Creative Commons license. [12] . . . 15 3.2 Decision boundary and support vectors for linearly separable data [13]. . . 16 3.3 Kernel trick for making the classes linearly separable. . . 16 3.4 Decision boundary and support vectors for non-linearly separable data

after kernel trick[13]. . . 17 3.5 Biological neuron and corresponding mathematical model.[15] . . . 18 3.6 Example of the nodes and connections of fully connected neural network. 19 3.7 Logistic sigmoid activation function. . . 19 3.8 Rectified linear unit activation. . . 20 3.9 Simple classification problem example using fully connected neural net-

work with different number of hidden units. Higher number of neurons allow the model to learn more complex decision boundary.[15] . . . 20 3.10 Convolutional kernels of the AlexNet.[17] . . . 21 3.11 AlexNet convolutional and fully connected layers. Each convolutinal layer

is followed by maxpooling. Output (layer 8) has 1000 nodes, one for each class of objects in the ImageNet dataset. [18] . . . 22

(8)

3.12 Recurrent Neural Network (RNN) idea visualized. InputX, hidden stateh and outputofor each time step. U,V,W are weights of the simple RNN

network. [19] . . . 23

3.13 Structure of a LSTM-cell. Two outputs for the next state, hidden state h and cell statec. Cell consists of the forget gateF, input gateIand output gateO. [20] . . . 23

3.14 Example of gradient descent process. . . 26

3.15 L2-regularization effect with differentλvalues. [15] . . . 26

3.16 Principle of the dropout. Image from the original paper Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from overfitting”, JMLR 2014.[21] . . . 27

4.1 Dataset included synchronized tracker output, associated radar observations and flight recording. . . 30

4.2 Samples were generated by sliding a window on the time vectors. Each step of the time vector contained new radar observation and a track update. 30 4.3 Bias problem that were compensated with the different processing chains. High Root Mean Square Error (RMSE) due to the long term bias even though the network has estimated the short term error very well. . . 31

4.4 Preprocessing steps for generating the training and testing samples from the tracks. Training data pipeline used the flight recording to bias correct the input tracks where as the testing pipeline used the flight recording only to calculate the ground truth. . . 32

4.5 Structure of the used CNN-LSTM hybrid network. . . 34

4.6 Input channels for the CNN-LSTM model. . . 34

4.7 SVM input features for the error level prediction. . . 35

5.1 Histogram of the normalized error before and after the altitude correction. . 37

5.2 Correction effects on the error signal. Negative values indicate that correction managed to reduce the error, positive means that the correction term actually increased the error. . . 38

5.3 Small part of the testing set where the estimate seems to work well. . . . 38

5.4 Small part of the test set where the model catches short term dependencies but misses the bias which then causes the correction terms to have wrong sign and effectively increase the error. . . 39

5.5 True error level, estimated error level and estimation error signals for small part of the test samples. The RMSE values were calculated from the whole test dataset. . . 41

5.6 Error histogram of the CNN-LSTM and statistical mean estimates. . . 41

5.7 Prediction error for the statistical mean and the SVM methods. . . 42

(9)

(10)

LIST OF SYMBOLS AND ABBREVIATIONS

Adam Adam optimizer

AESA Active Electronically Scanned Array CFAR Constant False Alarm Rate

CNN Convolutional Neural Network EKF Extended Kalman Filter LSTM Long Short Term Memory MAE Mean Absolute Error PD Probability of Detection PF Particle Filter

PRF Pulse Repetition Frequency RCS Radar Cross-Section ReLU Rectified Linear Unit RMSE Root Mean Square Error RNN Recurrent Neural Network

SGDM Stochastic Gradient Descent with Momentum SNR Signal to Noise Ratio

SVM Support Vector Machine

TAU Tampereen yliopisto (engl. Tampere University) UKF Unscented Kalman Filter

(11)

1 INTRODUCTION

Error modeling and error estimator design are amongst the fundamental problems in engi- neering. Understanding the uncertainty associated with information is the key for decision making. A multiradar tracker is a sensor fusion system that is able to provide notably more accurate information than the measurements of individual radars. The tracking algorithm utilizes radar measurement models and knowledge of the flight mechanics to perform the sensor fusion by taking into account the uncertainties associated with the measurements in order to give an estimate of the object state at the current time. Even though the measurement error models are fairly accurate, exact modeling of the radar wave propagation requires quite a lot of information on, for example, air temperature, humidity or clouds [1].

The altitude component of the target location is especially difficult to model due to the atmospheric effects causing the radio wave to bend. Given the uncertainty associated with the parameterization of the radar measurement models and the object movement models, error is not easy to model mathematically after the various linearization steps of the tracking system. Usually, part of the tracker internal state, the position covariance matrix, is used as the performance estimate.

Even though formulating a mathematical model for the tracker error signal might be chal- lenging, formulating it as a machine learning problem is rather straightforward in the case where a dataset containing real flight recordings and the corresponding tracker output is available. From the machine learning perspective, the problem is supervised regression of a single continuous variable, where flight recordings represent the ground truth. This study is based on a dataset consisting of an experimental multiradar tracker data with the associated radar observations and flight recordings.

Two slightly different formulations of the regression problem are made for the method based on a neural network, and for the other method based on a SVM. The first method, which uses convolutional feature extraction layers as well as recursive neural network based time series processing layers is presented and evaluated. The first method is trained for predicting both the absolute error level of the last track update and the signed error correction terms. This is accomplished by defining the optimization goal slightly differently for each case. The second method based on the SVM is used for estimating the average absolute error within a short window of track updates.

(12)

The structure of the thesis is as follows. Chapter 2 focuses on the physical background, the theory behind the radar measurements and introduces the basic concept of tracking with some of the current tracking algorithms. Chapter 3 introduces the basic machine learning concepts and the related theory relevant for the study. Chapter 4 presents the models used for the supervised regression problem and the preprocessing and normalization techniques. Chapter 5 presents the results, after which chapter 6 summarizes the findings of the study including a few ideas for the further development of the methods.

(13)

2 MULTIRADAR TRACKING

2.1 Radar measurements

Radar is a sensing device based on radio waves and can be used for measuring quantities such as distance, position and radial velocity of objects. Radars are widely used for example in flight control, marine traffic and adaptive cruise controls. The principle of operation varies greatly between different size factors and intended use cases. In this thesis, the theory focuses on monostatic scanning pulse-Doppler radars used in aviation.

2.1.1 Range measurement

Pulsed radar emits short bursts of radio waves which then interact with the object resulting some of the transmitted energy to be scattered back. Once the backscatter of the pulse is received, the rangeRto the target can be calculated with the equation

R = ∆t·c

2 , (2.1)

where ∆tis the time between the transmitted and received pulse and cis the speed of light in the medium.

The delay∆talso affects the Pulse Repetition Frequency (PRF). PRF limits the maximum unambigious range since if the PRF is too high the next pulse might have been already sent before the previous echo is received back. Radar might determine that the received echo was from the pulse just sent and not the previous. The target seems to be closer to the radar that it is in reality. High PRF is advantageous for more accurate Doppler measurement, however it might propose range ambiguity. This effect is known as the second-time-around echo.[2, p. 3] The maximum unambigious rangeR_uis

R_u = c

2·f_prf, (2.2)

(14)

wherefprf is the PRF. In practice, it is possible to circumvent this problem by for example phase coding the pulses so that the system is able to differentiate between earlier pulses.

2.1.2 Range resolution

Resolution Sr in the range measurement can be calculated using the durationτ of the transmitted pulse

S_r = c·τ

2 , (2.3)

andcis the speed of light. As can be seen from the equation, the pulse would need to be extremely short for the radar to be able to achieve good resolution in the range. In practice such a short pulse with high energy is hard to produce. The solution is to modulate the transmitted pulse in frequency or phase so that the pulse contains timing information also inside the pulse. With such pulse compression technique the range resolution equation becomes

S_r = c

2·BW, (2.4)

whereBW is the bandwith.[1] A typical pulse duration with pulse compression is in the order of 100µs, while achieving similar resolution without pulse compression would require a pulse duration in the order of 100ns.

2.1.3 Matched filter receiver

Figure 2.1.A block diagram of radar receiver signal processing chain.

(15)

Since range is calculated using the time between transmitted and received pulse, a detection of pulse edge is needed in the simplest case. Modern radars do not generally rely on simple leading or trailing edge detection but instead use matched filter techniques.

Concept of the matched filter receiver for radars is in principle the same as in telecommu- nications. The transmitted signal is converted as template for the expected return signal.

For a fixed point reflector, template is the complex conjugate of the time-inverted trans- mission signal. In practice, echo signal from the target is distorted by such factors as the doppler shift caused by the target radial velocity in relation to the radar. [3] A block diagram of radar receiver signal processing chain is shown in figure 2.1.

The received signal is detected with a filter bank, that is designed to have matches for the transmitted signal affected by the Doppler. Echo from target causes correlation peak in some of the filters which is then used to determine range and Doppler. Example of the filter bank can be seen in figure 2.2.

Figure 2.2. Block diagram of a receiver filter bank. Filters M F are tuned on different Doppler velocities.

As the output from the Doppler filter bank essentially is the Doppler spectrum for the target, it can be also used for target recognition. For example, a helicopter and propeller aircraft have distinct peaks on the Doppler spectrum caused by the target body radial velocity and the propeller blades. However, even though as more subtle, the same effect

(16)

is also caused by jet engine compressor blades which can then be used for identifying aircraft [4].

2.1.4 Accuracy of the range measurement

While the resolution of a radar system can be interpreted as the ability to distinguish between two targets close to each other, the accuracy can be interpreted as the ability to measure position of the target. Radar measurement error is usually dominated by com- ponents dependent on the Signal to Noise Ratio (SNR). The expected range measuring errorR_e can be calculated with the equation

R_e = c

2·BW ·√

2·SN R, (2.5)

whereBW is the bandwidth of the radar waveform andSN Ris the signal to noise ratio of the received signal. If the range accuracy is now compared with the range resolution we can see that the expected SNR based random error is in fact smaller than the resolution when SNR is greater than0.5.[1, p. 168]

2.1.5 Azimuth and elevation measurement

Scanning pulse-Doppler radars determine the position in addition to the range and speed of target. Antenna is rotating and the target position is resolved by measuring the azimuth and elevation angles of the returns in relation to the radar. Azimuth, elevation and range are then mapped from the local to the global reference frame by using the knowledge of the orientation and position of the radar. Azimuth angle is usually defined as the angle parallel to the horizon and zero angle being on the north. Elevation angle is usually defined as zero angle being towards the horizon. Example of the measurement setup can be seen in figure 2.3.

Many methods exist for determining the azimuth angle. One of the simplest is the method of strongest return. As the rotating antenna is sweeping, it is continously emitting beams with certain PRF. Returns for object are usually received from multiple beams and azimuth and elevation angles are then resolved by comparing the returns and choosing the angles of strongest return. The technique works relatively well for high SNR nonfluctu- ating targets such as passenger planes in steady flight states where the Radar Cross- Section (RCS) stays relatively constant from pulse to pulse as antenna sweeps over the target. However for fluctuating targets such as manoeuvreing fighter jets the method does not perform nearly as good since small changes on the orientation of the target cause too much variation on the returns.

(17)

Figure 2.3. Pulse-Doppler radar range and elevation measurements. Antenna type and target in the figure are drawn for meterological applications as the figure is derived from [.] However, same principles apply for other pulse-Doppler type radars used in aviation.

Best accuracy is achieved with modern Active Electronically Scanned Array (AESA) radars that utilize monopulse techniques in which the emitted beam is electronically splitted as can be seen in the figure 2.4. In the case of azimuth monopulse the signal is transmitted through the left and right halves of the antenna in phase. The returns are then processed using different receiver channels for each of the halves. The two lobes and the summation channel can be created by combining the signals in different ways. The same method can be performed in both azimuth and elevation directions which would then require four different receiver channels. The method allows fairly accurate measurement to be performed from a single pulse which makes it much more redundant against the errors generated by target RCS fluctuation. In such monopulse systems, the measurement errors are usually dominated by the SNR dependent error [1, p. 169]. The SNR -dependent error is random and the standard deviation σ_Afor the angular measurements can be calculated with the equation

σA= θ

k_m·√

2·SN R, (2.6)

where theθ is the beamwidth andk_m is coefficient of the monopulse difference slope[1, p. 170].

(18)

Figure 2.4. Monopulse lobes. Summation lobe Σ is achieved by adding the received signals from both halves. Correspondingly, difference channel can be achieved by substracting the signals from both halves.

2.1.6 Radar equation

The interaction between the radar and the target can be modeled with the radar equation.

Many formations of the equation exist on different use cases but the basic version defines the ratio of received signalSto background noiseN as

S

N = P_P ·G_T ·σ·A_R·P C

(4π)²·R⁴·BW ·kB·TS·L, (2.7)

whereP_pis the peak radio frequency (RF) power,G_tis the gain of the transmit antenna, σis the target RCS,ARis the receiving antenna aperture area,P C is the pulse compression ratio,Ris the range to target,BW is the waveform bandwidth,k_Bis the Boltzmann’s constant, T_S is the radar noise temperature andLis the radar power factor[1, p. 63]. A calculation example with typical values can be seen in the figure 2.5.

(19)

Figure 2.5.S/N calculation example recreated after [1, p. 66].

Important thing to note here is that the target RCS σ in the equation is not a constant but a property of the target that is function of such variables as the illumination angle, the surface conductivity, the frequency of the radar wave, and the target geometry. Determin- ing accurate RCS values for objects requires extensive simulation and physical testing.

Simulation of a passenger plane RCS can be seen in figure 2.6. [5]

Figure 2.6. Airbus A320 passenger aircraft simulated RCS values drawn on a sphere as a function of the viewing angle.

2.1.7 Bias errors

Although ground radar measurement error usually depends largely on the SNR, part of the error is caused by uncertainty related to the position and orientation of the radar.

As the radar might be able to detect targets hundreds of kilometers away, even small uncertainty of the antenna orientation might cause relatively large error when the local bearing and range measurement is transformed on global reference frame. As a rough

(20)

example, if there is bias offset of0.01^◦between world coordinate system and local radar coordinate system, this would result in error of roughly

35m = 200km·tan 0.01^◦, (2.8) if the target would be200kmaway. Especially in the case of airborne and marine radar systems, these orientation based errors might be significant in relation to the SNR based error and require compensation methods [6] [7].

Other phenomena also causing bias type of error on the result are the residuals of radio wave atmospheric bending compensation. Although in many simple examples, bending of the radio wave in the atmosphere is approximated with the effective earth radius model, often with the standard value of 4/3*earth radius, the phenomena is more complex. Local humidity changes such as clouds and variations on the air layer temperatures cause slight variations on the bending. Also if the radars use different frequency bands, the effect on the waves might vary. [1]

2.1.8 Detection threshold and probability of detection

Typical radar tracker detection threshold is decided as a trade-off between the desired Probability of Detection (PD) at certain SNR and in the other hand acceptable rate of false detections. Low rate of false detections is preferred on later stages when observations are associated on the tracks. In the other hand too high detection threshold causes observations from a real target of weak response to be ignored. Generally some form of adaptive algorithm has to be used for choosing the detection threshold since background noise levels vary between different resolution cells due to weather conditions.

Many methods exist for deciding the detection threshold, of which probably one of the most frequently used is the adaptive Constant False Alarm Rate (CFAR). The method operates on an assumption that noise is somewhat homogeneous on resolution cells close to each other at a given time. The detection threshold is then calculated individually for each of the resolution cells during the scan based on the noise levels on neighbouring cells. If the distribution of the noise is known, the detection threshold can now be choosen statistically in a way that the desired CFAR is achieved on each of the resolution cells.

Simplest method for calculating the noise level is straight cell averaging but this causes problems in situations where multiple targets fly on neighbouring resolution cells.

(21)

2.2 Tracking problem

Typical multiradar tracking problem consists of raw observation filtration by choosing ap- propriate detection threshold for each radar, deducing to which tracks the new observations should be associated with and updating the tracks based on new information. On the other hand not all observations are equally good and once observation is associated to track the tracking algorithm has also to weight the observations based on the uncertainty. Sometimes the detection threshold deduction is categorized more as a function performed by the radar hardware than the tracker software. In practical applications the tracker is required to run in real time or close to real time.

Multiradar tracking can be run either as a centralized computation on all of the raw observations or as a fusion for the tracks produced by the separate radars. In practice the multisensor tracking problem is broader and not as well defined as the monoradar tracking since it might require using information from sensors that utilize different measurement models, for example infrared and radar. Additional challenges might be possessed by the delays and bandwidth limitations on the communication channels between sensors and the tracker. Block diagram of the tracking process can be seen on figure 2.7.

Figure 2.7.Block diagram of the tracking process.[8]

(22)

2.3 Tracking algorithms

Target state prediction is the core problem in tracking. Tracker predicts object state at next time step t + 1 based on the previous state and object motion model and then updates the prediction with the new measured data, also updating internal covariance matrix representing the uncertainty of the target state. Kalman filter based operation cycle is shown in the figure 2.8. The prior information for the tracker are the target motion and the radar measurement models. The challenge with radar trackers is to 1) to be able to model the measurements and target movement accurately in real time 2) to actually evaluate the models with the tracking algorithms such as kalman filter which is in principle, linear. Environmental factors such as weather affect the measurement model in real time, and in the other hand different targets such as helicopters and fighter jets have very different movement models by nature.[9]

In real world applications the movement of the manouvering target is usually non-linear process which also requires that the algorithm is able to handle the non-linearity[9]. First popular alternative is the EKF which operates on the non-linear system by approximating the system with Taylor expansion in order to operate with the linear Kalman filter. Second alternative is the UKF which uses probability density functions to perform so called unscented transformation in order to approximate the non-linear system. Other non-linear methods are the particle filtering algorithms such as the Monte Carlo particle filter.[9]

Figure 2.8.EKF and UKF operation cycle.

As Kalman filter based tracking algorithm is operating, part of the target state is the covariance matrix of the target position. When the object is in steady flight state, this covariance matrix tend to converge into small values causing the tracking algorithm to weight more the predicted state in comparison to the measurements. When the target makes turn after a steady level flight, it takes several iterations before the target covariance matrix gets updated and the measurements are weighted enough over the state, introducing a lag on the track.

(23)

Figure 2.9. Simplified example of the multihypothesis principle.

Extensions for the different sensor fusion algorithms is the multihypothesis tracker which is especially suitable for tracking multiple targets. Problem with the traditional single hy- pothesis tracker is that especially when there are multiple targets it is hard to deduce which target produced the observation. Multihypothesis tracker approaches the problem by not only propagating the single track but instead generating multiple possible tracks for the target. All of these tracks are updated and propagated forward until some of the tracks are so unlikely that they can be removed. In practice it might be that only the most likely tracks for each object are shown for the operator.[10] Simple example of the multihypothesis tree is shown in figure 2.9.

(24)

3 MACHINE LEARNING

Machine learning is an area of study in which computer algorithms are learning to perform tasks based on training data, not by rules programmed explicitly. Machine learning problems are classified based on the training method and information used for the training. Typical examples of machine learning problems are classification, object detection and time series regression.

3.1 Supervised learning

Supervised learning is a set of problems where both the input and the true desired output are known during training. Training process is, in principle, relatively straightforward. The model estimates the output based on input data, and the difference between the model output and the desired output, also known as ground truth, is minimized by altering the model. Simple example of a supervised problem could be age estimation of person from images where the training is based on a set of images of persons whose age is known.

[11]

The challenges with supervised learning are often related to the available training data.

Firstly, supervised training requires more complete dataset than unsupervised learning.

Unlabeled or loosely labeled datasets are often much easier to acquire than labeled datasets. Second challenge is that as the training is straightforward optimization between the ground truth and the model output, quality of the ground truth has to be high in order to achieve good results.

3.2 Unsupervised learning

The simplest examples of unsupervised learning are the data clustering problems. As supervised learning is often focused on finding a decision boundary separating, for example, points associated to known groups, unsupervised learning takes a set of points as input and focuses on finding subgroups from the input set. Difference between unsupervised grouping and supervised classification problems are shown in figure 3.1. In practice, such grouping can be useful, for example, in market search when trying to identify different groups of customers. Other example could be of anomaly detection of machine

(25)

operation by analyzing data collected from operation and finding outliers that lie outside groups forming normal operation modes. [11]

Figure 3.1. Supervised and unsupervised learning. Figure first published in Der Radi- ologe under Creative Commons license. [12]

3.3 Support vector machine

SVM is a machine learning model that can be used for both classification and regression problems. The model can be used also for non-linearly separable cases with the use of kernel functions, and it can be considered as one of the best traditional machine learning models before the neural networks. The idea of the SVM is to use so called support vectors for finding a decision boundary separating the two classes by maximizing the margin between the support vectors. Support vectors are simply data points. Simple linear example can be seen in the figure 3.2. Usually the margin to be maximized is calculated with the so called soft-margin method which makes it possible to allow some outliers in order to maximize the margin. The function L to be minimized for the soft- margin SVM is defined mathematically as

L= 1

2w^Tw+C(n), (3.1)

wherewdefines the orientation of the support vectors,Cis a tunable parameter used to decide the tradeoff between margin and number of mistakes noted byn.

In practice, classes are usually not linearly separable and boundary can not be directly drawn. SVM uses so called kernel trick to circumvent this problem by mapping the input features into a different high dimensional feature space before drawing the boundary. This means that the classifier not only finds the decision boundary by choosing the support

(26)

Figure 3.2.Decision boundary and support vectors for linearly separable data [13].

vectors but it also has the kernels that transform the data into a feature space where the classes can be linearly separated.

Example of the transformation can be seen in the figure 3.3. As can be seen, simple linear decision boundary cannot be drawn for dataset consisting of variablesx1, x2. However, after the variablesx1, x2are transformed asx1^′ andx2^′with the kernel function, a linear decision boundary separating the two classes actually exists.

Figure 3.3. Kernel trick for making the classes linearly separable.

One of the most popular kernel functions is the Gaussian kernel defined mathematically as

(27)

k(x_i, x_j) = exp||x_i−x_j||²

2σ² , (3.2)

where σ is a tuning parameter that determines how local or global the effect of each training example is. Example of the SVM decision boundary with Gaussian kernel can be seen in the figure 3.4, where non-linearly separable data is first mapped into another feature space, then linear boundary is fitted, and finally the data is mapped back into the original feature space.

Figure 3.4. Decision boundary and support vectors for non-linearly separable data after kernel trick[13].

Support vector machine model usually serves as the final stage of the classifier, and the input is not necessarily the raw image or time series data but instead some handcrafted features or statistical measures. For example, bearing fault detection from acceleration data could be performed by first transforming the time series into a frequency domain with Fourier transform and then extracting the frequencies and amplitudes for 10 strongest bands. Training the SVM with the frequencies and amplitudes instead of the raw sensor data would most likely result in much better performance, since the Fourier transform processes the temporal dependencies between the input variables into a simpler form.

(28)

3.4 Neural networks

Neural networks are a family of machine learning methods that operate by propagating a set of input features through a network of neurons to the output. Neural networks and, especially the deep neural networks, are very fast developing area of research and can be considered as the state of the art method for machine learning. According to the Google Scholar, neural networks acquired more citations than any other area of research in 2020[14].

Figure 3.5.Biological neuron and corresponding mathematical model.[15]

The principle behind the artificial network is the neuron model that is somewhat inspired by biological neural cell. Connections between the neural network nodes correspond to the synapses. Denderites, locations where the signals enter the cells, are modeled with input weights. Input signals are transferred through the node to the output representing the axon of a biological neuron. Biological neuron and corresponding mathematical neuron model are shown in figure 3.5. It is worth noting that even though all of the artificial neural network architectures share somewhat similiar concept of neurons, the area of research have branched on multiple drastically different approaches. [11]

The key difference and probably the main advantage of the neural networks when compared to the traditional methods, such as the SVM, is the learned feature extraction. As the traditional methods often build on top of handcrafted features that are used as input for the model, neural networks are often fed with the raw input data. Handcrafting func- tional and robust features for image or speech is especially difficult task if the model has to operate in uncontrolled environment. However, in areas such as factory automation, where the model can operate on highly controlled environment, the handcrafted features are still widely used and relevant.

3.4.1 Fully connected neural network

The original neural network structure containing only fully connected layers can be seen as a layered network of nodes, where all the nodes from the previous layer are connected to all the nodes on the next layer. Input is connected to the first layer of nodes and output

(29)

is on the last layer. Example of the fully connected neural network with two hidden layers and two output nodes can be seen in the figure 3.6. Each output node could, for example, present a probability of a processed input image belonging to a certain class.

Figure 3.6.Example of the nodes and connections of fully connected neural network.

At the simplest, output y for each neuron is calculated as vector multiplication with the equation

y=σ(W x+b), (3.3)

wherexis the vector of input values from previous layer,W is the vector of input weights for the output values of nodes from previous layers and σ is the activation function. In other words, output of the neuron is formed as non-linearly transformed weighted sum of the inputs. Additionally, a bias term b is added for each connection. The network is trained by slightly altering the weight and bias terms of the network using gradient based optimization. Error for the output is calculated against the ground truth between each pass or in small batches, after which the gradient for each weight is calculated from the output to the input each layer at a time using the chain rule. This is critical in order to know the direction in which the error decreases for each of the parameters being adjusted.

Figure 3.7.Logistic sigmoid activation function.

In practice, there are a lot of problems with vanishing and exploding gradients as the

(30)

network gets deep. In order to avoid these problems, many different activation functions and modifications to the structure have been developed. Most popular activation functions are the logistic sigmoid shown in the figure 3.7 and the Rectified Linear Unit (ReLU) unit as seen in the figure 3.8.

Figure 3.8.Rectified linear unit activation.

The capacity of neural network to represent functions depends on the number of neurons on the hidden layers and the depth of the network. Higher number of neurons allow the network to learn more complex relationships but make the network harder to train.

Tendency of the neural network to learn features that are specific only to the training examples but are not present on other samples is called overfitting. Generally, more complex the models are, the higher the risk of overfitting is. Example of the different number of hidden units is represented in figure 3.9.

Figure 3.9. Simple classification problem example using fully connected neural network with different number of hidden units. Higher number of neurons allow the model to learn more complex decision boundary.[15]

3.4.2 Convolutional layers

Even though neural networks with only fully connected layers are able to overperform traditional machine learning methods, such as the SVM, in many applications, the massive number of learnable parameters is an issue. Convolutional layers drastically reduce the

(31)

Figure 3.10.Convolutional kernels of the AlexNet.[17]

number of parameters to be learned which then allows the use of deeper and more complex models. Additionally, multilayer convolutional structure directs the perceptive area naturally in a way that the perceptive area increases during the inference through the network towards the output classification/regression layer. The famous AlexNet neural network which crushed previous method on the ImageNet Large Scale Visual Recogni- tion Challenge in 2012 and started the current era of deep learning was a convolutional neural network.[16]

Instead of the perceptrons used in the fully connected layers, the convolutinal layers build on the convolutional kernels. Each kernel, for example of size 3×3, consists of filter coefficients, that are used to calculate value for the center position of the kernel as the filter is sliding over the input. Filter operates in the same way as, for example, typical averaging kernel used in image smoothing or the Laplacian kernel used for sharpening.

As the neural network is trained, the convolutional coefficients of the kernel are altered.

Example of the AlexNet convolution kernels learned during the training can be seen in the figure 3.10. If the AlexNet filter kernels are compared with, for example, the edge detecting Sobel kernel, it can be seen that some of the learned kernels in fact resemble edge detecting filters which seems intuitive for object detection.

Convolutional layers are usually paired with maxpooling which is basically an operation for downsampling the spatial resolution of the feature maps. For example, if the first convolutinal layer of the neural network has 20 convolutional kernels, the output from the first layer is 20 images filtered with different convolutional kernels. The spatial size of the image is first reduced if the stride of the kernels isstride > 1, after which the maxpooling

(32)

further squeezes the spatial size of the feature maps. The ratio of how much the spatial resolution is reduced by maxpooling and stride of the convolution varies between architectures. Figure 3.11 shows how the 227× 227×3 input RBG image is squeezed to 6×6×256feature vector as it passes through the convolutional feature extraction layers of the AlexNet.

Figure 3.11. AlexNet convolutional and fully connected layers. Each convolutinal layer is followed by maxpooling. Output (layer 8) has 1000 nodes, one for each class of objects in the ImageNet dataset. [18]

3.4.3 Recurrent neural networks

The difference between RNN and feed forward networks such as the AlexNet[16] is that the RNN has feedback connections effectively allowing them to have memory between input timesteps. The nodes are connected with one way connections just like in the feed forward networks, however, some of connections actually loop back from the output. Input for the next timestep consists of the loop back, which is usually called as the hidden state, and the new input. The idea of the loop back and how the information propagates over multiple time steps is illustrated in figure 3.12. Problem with RNN is that the "memory mechanism" requires the information to be passed through the network on every time step making the gradient to vanish relatively quickly and rendering the memory mechanism to be effective for just a few time steps.

Since the output is available for each timestep as seen in figure 3.12, RNN networks can be used for two general tasks. Simplest use case is the sequence-to-one, in which sequence of inputs are fed in and only the output of the last RNN cell is used as output. Second general use case is the sequence-to-sequence transformation, in which the outputs from each of the time steps are used.

(33)

Figure 3.12. RNN idea visualized. Input X, hidden stateh and output o for each time step.U,V,W are weights of the simple RNN network. [19]

3.4.4 Long short term memory

Long Short Term Memory (LSTM) is a type of RNN that is well suitable for processing time series data such as speech. Idea of the LSTM is to improve the RNN performance with the addition of cell state mechanism that is controlled with a forget gate. As the loop back still provides the short term memory just as in the traditional RNN, the cell state is additional structure that allows the node to store information without the loop back to provide more long term memory. Effectively this means that as the input for the next time step has now two different types of "loopbacks" - the hidden state and the cell state - whereas the traditional RNN only loops back the hidden state. Difference on the structure of the nodes can be seen in figure 3.13.

Figure 3.13.Structure of a LSTM-cell. Two outputs for the next state, hidden statehand cell statec. Cell consists of the forget gateF, input gateI and output gateO. [20]

Input cell state C_t−1 for each LSTM cell is modified in two different stages, of which the first is convolution with the forget signal. Forget signalF_t is formed by the previous hidden stateht−1 and the current inputx_tpassed through sigmoid activationσ. It can be mathematically formed as

(34)

Ft=σ(wf ·[ht−1, xt] +bf), (3.4)

wherew_f andb_f are the learnable weights of the forget gate.

After the previous cell state Ct1 is formed with the forget signal, the next cell state is formed by adding information with signal C˜_t formed by a convolution of two versions of the input signal,I_sig andI_tanh. Mathematically defined as

C˜ =_t σ(w_Isig·[ht−1, x_t] +b_Isig)∗tanh(w_Itanh·[ht−1, x_t] +b_tanh), (3.5)

wherew_Isig,b_Isig,w_Itanh andb_tanh are the weights. Note that the intermediate signalC˜ and the input signals are named slightly differently in figure 3.13. Finally, the next cell stateC_tis formed with the intermediate signals as

C_t=Ct−1∗F_t+C˜_t. (3.6)

Output from the current cello_tis essentially same as the hidden stateh_twhich is calculated as

h_t=σ(w_o·[ht−1, x_t] +b_o)∗tanh(C_t), (3.7)

wherew_oandb_o are the weights.

3.5 Training neural networks

Training process of neural network can be generally summarized as a navigation problem in the loss space, using the gradient of the loss function as a guiding parameter. Loss space is shaped by the number of tunable parameters, each combination of parameters representing a location in the multidimensional space. Loss function is a function used to determine current error, and the task is to minimize the error.

The choice or even custom definition of the loss function is probably one of the most important choices considering the training process. Probably one of the most common lost functions is the simple L2-loss which is defined as

(35)

L2Loss=∑︂

(Y T ruei−Y P redictedi)², (3.8)

where Y T rue is the ground truth and Y P redicted is the output from the neural network. This definition works relatively well for simple regression problems. However, in many of the image enhancing or audio processing tasks this definition does not work well, since the loss is much harder to define. For example, when comparing the similar- ity of two images, calculating simple difference between pixels will not correlate with the visual evaluation of the quality very well.

3.5.1 Stochastic gradient descent

As previously mentioned, principle of the gradient descent is to minimize the loss function by traveling on the direction where the values of the loss function decrease the most.

Mathematically the state transition of the weights is formulated as

w_n+1 =w_n−λ· ∇Loss(w_n), (3.9)

wherew_nis the current state of the weights andw_n+1is the next state of the weights after the update. Parameterλis the so called learning factor, which essentially defines the size of the weight updates. Too smallλ will make the training process very long-lasting and too high will make the algorithm to diverge. Example of the descent process is shown in figure 3.14.

Problem with the "normal" gradient descent is that calculating gradients for all of the samples in training set before each step gets quickly computationally too expensive. Stochas- tic gradient descent uses the statistical concept of sample. It picks only some part of the training set that statistically represent the training data and calculates the gradient based on the sample instead of the whole training set. Most used variation of the stochastic gradient descent is to calculate the gradient for randomly selected batch.

Very common modification to make the method more robust is the addition of momentum.

This is achieved for adding term to represent the dimension of previous weight update.

With the Stochastic Gradient Descent with Momentum (SGDM) the state transition of the weightsw_nis mathematically formed as

w_n+1 =w_n−λ· ∇Loss(w_n) +α∆w_n, (3.10)

(36)

Figure 3.14.Example of gradient descent process.

where ∆w_n represents the gradient of the weight update on previous step and α is the momentum parameter.

3.5.2 L2-regularization

Figure 3.15.L2-regularization effect with differentλvalues. [15]

Challenge of the training process is that the loss space is extremely complicated and consists of large number of local minimums where the training process gets easily stuck.

Probably the most common method is the L2-regularization. Principle of the method is to modify the L2-loss function by adding penalty for extreme weights. Regularized L2-loss is defined

(37)

L2Loss= 1 2

∑︂(Y T ruei−Y predictedi)²+λ 2

∑︂w²_i, (3.11)

where λis the regularization factor and wis the value of each weight. [11] Effect of the regularization can be seen in figure 3.15. Higher level of regularization results in smoother decision boundary.

3.5.3 Dropout

Another classical method for avoiding overfitting is the dropout [21]. Regularization of the weights is achieved by randomly zeroing out some of the nodes on each iteration. As there is certain probability that each node gets ignored during the training phase, it effectively restricts the network from learning to make decisions based on just a couple of neurons.

Principle of the method is shown in figure 3.16. Effects of the dropout regularization with various datasets such as the ImageNet, CIFAR-10 and CIFAR-100 can be found from [21].

Figure 3.16. Principle of the dropout. Image from the original paper Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from overfitting”, JMLR 2014.[21]

3.5.4 Batch normalization

One of the challenges with deep neural networks is that even though the inputs for the network are normalized in preprocessing, magnitude of the outputs between layers might vary greatly during training. Phenomena is explained to be caused by the fact that each mini batch used to calculate the gradient in stochastic gradient descent does not have

(38)

even distribution, even though the dataset as a whole has. Experiments published in [22]

show that normalizing the outputs between each layer of Convolutional Neural Network (CNN) might have great effect on the convergence of the network.

Batch normalizationBN is formally defined as

BN(x) = γ⊙x−µˆ_B

σˆ_B +β, (3.12)

for each of the samplesxin batch, whereµˆ_Bis the mean andσˆ_Bis the standard deviation of the batch. Parameters γ and β are learnable scaling and shift parameters that are learned during training. [11]

When the batch normalization is used with convolutional layers, the normalization is ap- plied after the convolution and before the non-linear activation function. Normalization is performed separately for each of the output channels. In other words, normalization for convolutional layer withnchannels requiresndifferent mean and variance values that are calculated for each channel over all pixel locations and over all the images in the batch.

[11]

(39)

4 PROPOSED METHODS

In this work, SVM and CNN-LSTM models were developed for error level estimation and error correction in multiradar tracker system. For estimating the current error level, SVM and CNN-LSTM models were used. For predicting the signed error correction term, a CNN-LSTM model was developed. The CNN-LSTM method predicts the signed altitude correction term and general altitude error level for the latest track update as the SVM method estimates the average error level for a short window of track updates. The CNN- LSTM model was inspired by recent methods proposed for anomaly detection and house- hold electrical load prediction in publications [23] and [24], and was selected as base for the deep learning method due to it’s suitability for processing time series data.

The motivation behind the methods is not to create more accurate radar measurement model or object kinematic model for the tracker but instead operate on a higher level and learn dependencies between the models and the tracking algorithm. The aircraft orientation in relation to the radars, i.e. the aircraft aspect angle, has high impact on the SNR values of the measurements, and it would be beneficial if the model could use this information to determine object flight states.

It is also worth noting, that the method was developed for static experiment setup specific to certain geographical location and radar measurement system. It is very likely that the method would not generalize on other imaging geometries as it is. However, the method could be used as way to adapt on specific measurement setup by improving altitude accuracy with relatively small amount of data collected just over a few days.

4.1 Dataset

The dataset used in this thesis contained altitude information from a experimental multiradar system and a sensor fusion setup. Raw radar observations from certain geographical area had been been extracted and tracked offline with a conventional tracking algorithm. After the tracking, altitude values from the raw radar observations, the tracker altitude estimates and an actual altitude signal from aircraft flight recordings had been included in the dataset. Additionally the dataset contained timestamps for the radar observations and track updates, SNR values for the radar observations and index values that identified which radar sensor had made the measurement. The exact models or

(40)

types of the radars corresponding the sourceID:s were not included in the dataset. How- ever, the dataset contained observations from different types of radars at fixed locations during the data collection making the imaging geometry constant over the whole dataset.

Block diagram of the fields can be seen on the figure 4.1.

Figure 4.1. Dataset included synchronized tracker output, associated radar observations and flight recording.

4.2 Training and testing samples

Figure 4.2.Samples were generated by sliding a window on the time vectors. Each step of the time vector contained new radar observation and a track update.

The data described in 4.1 was formulated as a machine learning dataset by splitting the information from tracks, observations and flight recordings into short, fixed size samples that were then used for training and testing. Splitting was performed by sliding window over the track in timestamps of the observations in a way that for every timestamp, a

(41)

sample was created that contained data from the current timestap and certain amount of previous measurements as seen in the figure 4.2. Window lengths of 15 for the SVM and 80 for the CNN-LSTM seemed to give good results, although complete hyper parameter search was not conducted. It is worth noting that the time steps are the time values of the radar observations which naturally come when some of the radar beams happen to sweep over the target, meaning that they are not equally spaced.

The dataset contained tracks from 5 different days, and for each day there were data from morning and afternoon having a few hours in between. The split into testing and training was made based on this in a way, that the data from the first morning was used for testing and the model was trained with the tracks from the 9 other sessions. In other words, closest training examples were recorded on the same day a few hours after and the rest of the data was recorded on following days.

4.3 Preprocessing

Figure 4.3. Bias problem that were compensated with the different processing chains.

High RMSE due to the long term bias even though the network has estimated the short term error very well.

Time and altitude vectors for each sample were normalized in order to avoid overfitting.

With out normalization, the model would likely learn to identify the training examples by the exact altitude and make predictions based on that. Each sample was "shifted" from the actual flight altitude to zero altitude by subtracting the mean altitude of that specific sample and the time vectors were normalized by converting each time step as offset from the the first observation of the sample. Even though the radar observations overall were

(42)

correctly compensated for the radio wave bending in the atmosphere and the altitude error mean was close to zero when calculated over all of the tracks, there were altitude bias on the individual tracks so that the altitude error calculated to the flight recording was not zero mean. This may have been caused by some local, relatively steady weather phenomena which caused the observations of certain track to be constantly too high or too low. This caused problems when training the model as the error signal was highly negative or positive for all samples generated from tracks with significant bias no matter how good the short term temporal estimate was. Example can be seen in the figure 4.3.

Figure 4.4. Preprocessing steps for generating the training and testing samples from the tracks. Training data pipeline used the flight recording to bias correct the input tracks where as the testing pipeline used the flight recording only to calculate the ground truth.

The model performed better when the normalization was performed slightly differently for the training and testing data. For the training data, the altitude information from the flight recording was also used. As the testing samples were shifted to zero altitude by subtracting the mean altitude of the sample, the training samples were as well subtracted the mean but also shifted form the zero by the amount of bias between the track and flight recording. Block diagram of the pipelines can be seen on figure 4.4.

The normalized sample features were used to predict the error estimateyˆ. The prediction errorE of the model were then calculated simply as

(43)

E =y−yˆ, (4.1)

using the true error signalycalculated with the flight recording during the preprocessing.

The difference between the training and testing preprocessing was simply, that the long term bias errors were removed from the training set with the additional bias correction step.

As a result the samples used for training did not have zero altitude mean but instead were offset by the amount of bias calculated against ground truth. Obviously this could not be used for the testing samples since it would require using information from the flight recording used as ground truth. Basically this allowed the model to effectively learn short term error characteristics from samples with significant bias even though detecting the long term bias based on the 80observations in the sample window was not possible to be predicted.

4.4 CNN-LSTM model

The same CNN-LSTM model was trained for both error level and error correction term estimation by simply defining the learning target differently. The learning target for the error level estimation was the absolute altitude error between the track and flight recording as for the correction term estimation also the direction of the error were included. Both variations predicted the value for the most recent timestep of the sample. The model consists of two convolutional layers paired with batch normalization and ReLU layers which are then followed by the LSTM, fully connected layer and the regression output.

Both convolutions were performed without padding, stride2and kernel size10. Structure of the network can be seen on the figure 4.5.

The model was trained for 300 epochs with SGDM by evaluating the gradients in batches of128. Training data was shuffled and divided into batches again in every epoch. Gradi- ents exceeding1were rounded down to1. L2 regularization of0.00001was used.

Grid search was used for tuning the hyper parameters such as the convolution kernel size and stride, number of hidden units in the LSTM and the number of convolutinal kernels.

Experiments were conducted with both Adam optimizer and SGDM. The samples used for validation during the hyper parameter search were separated from the training set before training and they were not part of the testing samples. Validation data was preprocessed with the same method as the training data.

Input features for the CNN-LSTM were just the raw observation and track update time- series that were normalized as described in section 4.3. In other words, input sequences

(44)

Figure 4.5.Structure of the used CNN-LSTM hybrid network.

were of size[4×80]. The input channels can be seen on the figure 4.6. Some efforts were made to also include the time of each measurement since the amount of timesteps between measurements varied, but the attempts were not successful to encode this in a meaningful way for the network.

Figure 4.6.Input channels for the CNN-LSTM model.

(45)

4.5 SVM Model

The SVM regressor was trained to predict the average absolute altitude error on given window. In other words, inputs for the regressor were features calculated from the last n measurements and it predicted the average absolute altitude error to the ground truth trajectory for the window. It is worth noting that even though the error definitions for the CNN-LSTM absolute error prediction and the SVM models are close to each other, predicting the absolute error for a window is easier problem than predicting absolute error of the latest track update.

Figure 4.7.SVM input features for the error level prediction.

A few experiments with different windows sizes were made, however a thorough parameter search was not conducted. Validation data used for the hyper parameter search and feature selection was separated from the training set just as with the CNN-LSTM method.

The final window size used when evaluating the model was 15. Since the SVM did not function well on raw time series data as input, a set of features were handcrafted to represent the window of measurements. The handcrafted features were evaluated with the validation data, and a subset was chosen as the final input features. Used features were the standard deviation between the radar measurements and track updates, average SNR of the radar observations, total altitude change during the sample window and average altitude change rate. Couple of other features such as maximum altitude change rate and signed mean difference between the measurements and track updates were tested with the validation data but they did not seem to enhance the performance. During the hyper parameter search it seemed that the standard deviation feature was the most important of the used features by significant margin. The input features can be seen on the figure 4.7. SVM model used the gaussian kernel function without input feature standardization.

Automatic kernel scaling was used.

(46)

5 RESULTS

It is important to note that the results of this thesis were achieved for specific tracking algorithm which type and parameterization was not known. Also no information about the parameters of the underlying radar system such as used operating modes and waveforms were available. As performance of the tracking algorithm heavily depends on quality of the radar measurement models and target kinematic models, results achieved with the same algorithm can vary substantially. Also it is worth noting that the performance of developed machine learning methods depend on how good the tracker altitude estimate is to begin with and which kind of errors it makes. Based on the information available on this study, it is impossible to say if the performance gain was just due to lousy parameterization or if the method was actually able to improve a well parameterized tracker. In practice, it would be required to rerun training process for each system and parameterization specifically to see what kind of results the methods achieve. Actual error levels being highly system and parameterization specific, all results are presented as normalized with the error mean.

5.1 Altitude correction term estimation

The model was evaluated with a test set of data that the model had not seen before.

Input was a fixed size sample with the length of80time steps and prediction was done in a way that the LSTM model was reset between each prediction. It is worth noting that the CNN-LSTM model is not limited to processing fixed sized samples but the choice was made in order to make the result analysis easier and more informative. In practice, it would be beneficial to use such model without resetting the LSTM state between each prediction. In other words, the model would most likely perform better for tracks longer than 80 measurements and worse for tracks with less than 80 measurements.

Estimating the performance of a multiradar tracker using machine learning