Exploring a large dataset : typical behavior of UHF signal propagation

(1)

Petteri Palojärvi

Exploring a large dataset: typical behavior of UHF signal propagation

Master’s Thesis in Information Technology 2020-12-04

University of Jyväskylä Faculty of Information Technology

(2)

Author:Petteri Palojärvi

Contact information: petteri.palojarvi@jyu.fi

Supervisors: Sami Äyrämö, and Jussi Hakanen

Title:Exploring a large dataset: typical behavior of UHF signal propagation Työn nimi:Sukellus laajaan aineistoon: UHF signaalin tyypillinen eteneminen Project: Master’s Thesis

Study line: Information Technology Page count:44+0

Abstract:The design and operation of radio networks requires good understanding of radio propagation. This study explores a dataset containing measured instantaneous power values from a statewide UHF network. Using spectral analysis the measured power was found to have periodic variations of once and twice per day. Faster variation between 0.1 mHz and 1.4 mHz was also detected in 34% of the connections. Hierarchical clustering was used to determine typical value distributions of the measurements. Clusters of symmetric, positively skewed or negatively skewed value distributions and several widths were found.

Keywords:UHF, radio propagation, spectral analysis, hierarchical clustering, time series Suomenkielinen tiivistelmä: Radioverkon suunnittelua ja käyttöä varten täytyy radio aal- tojen eteneminen ymmärtää hyvin. Tässä tutkimuksessa tutustutaan laajaan mittausaineis- toon hetkellisiä tehoja maanlaajuisesta UHF verkosta. Spektrianalyysillä todettiin mitatussa tehossa olevan jaksollista vaihtelua taajuuksilla kerran ja kahdesti päivässä. Myös nopeam- paa vaihtelua välillä 0.1 mHz ja 1.4 mHz todettiin 34% yhteyksistä. Hierarkisella ryhmitte- lyllä etsittiin tyypilliset mittausten arvojakaumat. Saaduissa arvojakaumien ryhmissä oli eri levyisiä vasemmalle tai oikealle vinoja tai symmetrisiä jakaumia.

Avainsanat:UHF, radioaaltojen eteneminen, spektrianalyysi,hierarkinen ryhmittely, aikasarja

(3)

List of Figures

Figure 1. Map of stations and connections in the study . . . 9

Figure 2. An example of artifacts in the data. . . 11

Figure 3. Periodogram obtained with the Welch’s method . . . 21

Figure 4. Spectrogram of the measured signal power with a 682 MHz signal. . . 22

Figure 5. Spectrogram of the measured signal power with a 690 MHz signal. . . 23

Figure 6. Spectrogram of the measured signal power with intermittent daily variation. . . 24

Figure 7. Spectrogram of the measured signal power with a slowly up-chirping peak. . . 25

Figure 8. Spectrogram of the measured signal power with two peaks. . . 26

Figure 9. Spectrogram of the measured signal power with three different peaks. . . 27

Figure 10. Dendrogram using Hellinger distance . . . 28

Figure 11. Dendrogram using total variation distance . . . 29

Figure 12. Dendrogram using Kolmogorov distance . . . 30

Figure 13. Dendrogram using Wasserstein distance with p=1 . . . 31

Figure 14. Dendrogram using Wasserstein distance with p=2 . . . 32

Figure 15. Overlaid plots of clusters, from clusterings with Hellinger, total variation and Kolmogorov distances . . . 33

Figure 16. Overlaid plots of clusters, from clusterings with Wasserstein(p=1) and Wasserstein(p=2) distances . . . 34

List of Tables

Table 1. Five statistical distances used to cluster value distributions . . . 7

ii

(4)

1 Introduction

Large networks of radio broadcast stations are in common use for communication and other purposes. The fundamental principles of signal propagation are well known and verified, but the practical problem remains complicated with exact answers usually impossible. Still, radio signals are of critical importance for the workings of modern civilization. Large networks of radio stations have been deployed for numerous purposes. It is a relatively simple addition to employ a station to measure properties of the signals it receives in addition to its normal operation. An entire radio network can be equipped with measurement devices to produce datasets with large geographic and temporal extent.

1.1 Research questions

This study explores one such dataset containing measurements of received signal power to answer two questions with only limited influence from the physics of signal propagation.

First, are there periodic variations in the measured values, either in the dataset as a whole or as individual signals? Second, what is typical behavior of the received signal power in this radio network?

1.2 Structure

Chapter 1 of the thesis contains an introduction to the research problem and its relevance, research questions and explains the structure of the thesis. Chapter 2 of the thesis explains the necessary theoretical background, spectral analysis, hierarchical clustering, statistical distances and splines. The chapter also briefly discusses theoretical and empirical research of radio signal propagation, as it applies to UHF signals. Chapter 3 describes the dataset in detail and explains the analysis methods used in the thesis. Chapter 4 reports the results and the last chapter answers the research questions based on the results. The chapter also discusses the limitations of the data and the methods and provides a conclusion and an overview of the study.

1

(6)

2 Theoretical background

This chapter is divided into five sections. Section 2.1 explains how and why radio wave propagation is usually modelled with a probability distribution. Section 2.2 introduces spectral analysis of a discrete time series. Section 2.3 discusses clustering and hierarchical clustering in particular. Section 2.4 lists five statistical distances and finally, Section 2.5 explains splines and related terminology.

2.1 UHF signal propagation

The study of radio wave progation is a well-established field in physics and engineering. The Radiocommunication Sector of the International Telecommunication Union, in a recommendation ITU-R 2017, lists a number of effects that change the propagation on a terrestrial path compared to propagation in vacuum. There are two kinds of effects, first, attenuation due to material along the path: air, precipitation, dust or sand. Secondly, fading, which is caused by the wave arriving at the receiver along multiple paths. For example, part of the wave can reflect from the surface, while another part propagates along a straight line. Refraction caused by atmospheric inhomogeneity and diffraction around an obstruction can also enable multiple paths.

Radiation from multiple paths can arrive in a different phases because the lengths of the paths can differ. The signal frequencies in this study correspond to wavelengths of about half a meter and the connections are tens of kilometers long, so paths just slightly different from the line-of-sight path can result in large differences in the phases. Small changes in the paths make the signal strength vary when different components go in and out of phase with the line-of-sight signal.

Propagation is often modelled using various probability distributions. For example, the large number of similarly sized components arriving from different paths can be modelled as a sum of a number of 2d-vectors with comparable size and uniformly random directions. The magnitude of the sum then follows a Rayleigh distribution (ITU-R 2019). If one adds the line-of-sight component as a constant vector, the magnitude of the sum now follows the

(7)

Nakagami-Rice distribution (ITU-R 2019).

In practice, when the signal is not obstructed by buildings or the ground, the measured power has been found to be close to lognormal distribution, or equivalently, normal distribution with power measured with a logarithmic scale (Meno 1977). Like the data in this study, Witvliet et al. 2011 measure tv broadcast signals with similar paths over land and sea. Their measurements from a 56 km connection agree well with a lognormal distribution.

2.2 Spectral analysis

A basic tool of spectral analysis is the discrete Fourier transform, which transforms an N- dimensional complex vector x into another N-dimensional complex vector X. Using the standard basis¹, ifx_nis thenth component ofx, then thenth component ofX,X_n, is defined as:

X_n=

N−1 j=0

∑

x_jexp(−2iπn j/N). (2.1)

The valuesX_n are inner products ofxwith a set of vectors whose components vary period- ically. The components of the vector used in calculatingX_n containn periods. This set of vectors forms an orthogonal basis, andX_n can be understood as the components ofx in it.

For an introduction about the transform, its motivation and properties see Oppenheim and Schafer 1999, for example.

A periodogram plots the squared magnitudes ofX_n. If the series contains only real values, X_n = X_N−n. Complex conjugation does not change the magnitude, so plotting the periodogram forn=0, . . . ,dN/2eis sufficient in that case. Periodograms should be understood as representing how much of the variation in the data is variation at a given frequency. Find- ing a peak in the periodogram therefore means an affirmative answer to the first research question. Finding instead no significant peaks means that if there was a periodic component in the variation, it was outside the range of tested frequencies or too weak to be visible.

In many methods a long time series is often split into shorter pieces before transforming the pieces separately. This process is known as windowing in spectral analysis. Simply cutting

1. ForC^N, the standard basis is the set{(1,0, . . . ,0),(0,1, . . . ,0), . . . ,(0,0, . . . ,1)}

3

(8)

the series and keeping the values unaltered can be viewed as multiplying the values of the series with weights of either 0 or 1. This sequence of weights is known as the window function.

Windowing changes the results of the transform, so it is necessary to understand its effects.

Transforming the windowed sequence results in a circular convolution of the actual transform with the transform of the window function. To leave the actual transform as unchanged as possible, the transform of the window function should have a large, narrow peak at the start with the rest of the values in the transform small in relation to the peak. In practice, the peak will have sidelobes, lower peaks at regular intervals from the first peak. The choice of the window function is then a tradeoff between a narrow peak with prominent sidelobes and a wider peak with lower sidelobes (Oppenheim and Schafer 1999; Harris 1978).

Harris 1978; Nuttall 1981 investigated the effects of a large number of popular window functions and provide advice on what properties are desirable in different situations. One family of window functions are generalized cosine windows, which are of the form

w(t) =1/L

K

∑

k=0

a_kcos(2πkt/L),t =0,1, . . . ,L−1, (2.2) WhereLis the length of the window,t is the index of the sequence, K the number of terms and a_k are the parameters of the window that can be set based on the application. Nuttall 1981 provides a number of numerically optimized parameter sets for different use cases.

2.3 Hierarchical clustering

In cluster analysis objects are grouped together according to some similarity measure. The results can be used to find outliers in the data or to generalize typical characteristics from a set of objects, among other things. Jain, Murty, and Flynn 1999 breaks the clustering process into five steps with two optional steps. The first step is to choose the data that is to be clustered and to preprocess (if needed) the chosen data. The second step is to define similarity of the objects in data. The choice of similarity largely defines what clusters will be produced. Objects similar in one sense may not be similar in some other sense. For example, similarity of photos could be based on the objects in them. Using instead a similarity based

(9)

on technical aspects like the depth of field and the angle of view would produce an entirely different clustering.

The third step is assignment of objects in the data into different clusters. This is done with a clustering algorithm. In some algorithms the similarity or even data selection and prepro- cessing depend on the object assignment, so the clustering process becomes a loop, where the steps one, two and three or two and three alternate until some terminating condition is reached.

Jain, Murty, and Flynn 1999 calls the fourth, optional, step data abstraction, which means finding a description of the clusters that is simpler than just listing their members. This step can be done to enable or simplify further automatic processing. It can also be done from a human point of view, to make the clustering result easier to comprehend. In this study this step is used to answer one of the research questions. The similarity of the objects is based on the similarity of the measured value distributions, so a description of the typical value distributions can hopefully be inferred from the clusters produced.

The final step is assessing the validity of the clustering result. Clustering algorithms always produce some assignment of objects into clusters, regardless of the data. It is therefore incorrect to claim that there are clusters in the data based only on the clustering result. Ad- ditionally, some clustering algorithms use random values in their operation, or the data itself may contain randomness. It is therefore necessary to determine whether the results could have been caused by random chance.

Validation is also relevant because there were choices made when deciding which clustering algorithm to use and some clustering algorithms further require their user to set parameters which control their operation. These choices affect the results, so the purpose of validation is to make sure that the results are not simply due to these choices, but instead reflect an inherent property of the data. Such validation could be done manually by inspecting the results to see how plausible they are. Manual validation is of course subjective and it may depend on the quality of visualizations available for the researcher. For some data, useful visualization may even be impossible.

Alternatively, algorithmic solutions have been proposed to measure how stable object as- 5

(10)

signment is with repeated runs of the same algorithm (Lange et al. 2004; Luxburg 2010).

If the clustering algorithm is deterministic, runs need to use (partially) different data. Gio- nis, Mannila, and Tsaparas March 2007 instead suggests aggregating clustering assignments from several different algorithms to avoid suffering from possible weaknesses of a specific algorithm. Automating some of the validation should make the process less subjective and results more reproducible, but there is no escape from subjectivity, because now there is a subjective choice in what algorithms or statistics support validation.

A famous example of a clustering algorithm that requires a parameter is k-means, which has been proposed several times in different contexts (Jain 2010). The current name was already used in MacQueen 1967. The parameter k-means needs is the number of clusters to produce from the data (Jain 2010; Jain, Murty, and Flynn 1999). This parameter was carried on to the many algorithms deriving from k-means (Jain 2010).

Hierarchical clustering produces a tree of clusters. The leaf nodes of the tree are clusters with just a single object and each non-leaf node is a cluster that is the union of its two children.

All objects end up in a same cluster at the root node of the tree (Jain, Murty, and Flynn 1999).

Most hierarchical clustering algorithms work by starting from the leaf nodes and iteratively merging clusters (Jain, Murty, and Flynn 1999). The clustering algorithm used in this study is complete-linkage clustering (King 1967). Complete-linkage iteratively merges the two clusters that have the least distance. The distance between clusters is the maximum distance between their members.

2.4 Statistical distances

Statistical distances measure the similarity of two distributions. There are many such distances defined for various purposes. The goal is not to study the distances themselves, so convenience was the most important factor when deciding which distances to implement.

The distances used are well known, simple and easy to interpret. They are at least to some degree sensitive to different features of the distributions. The five distances chosen are listed in Table 1.

(11)

Hellinger 1/2^R_R √ f−√

g2

dµ

Kolmogorov sup_x∈_R|F−G|

Total variation 1/2^R_R|f−g|dµ Wasserstein, with p=1 ^R₀¹

F⁻¹−G⁻¹ dµ Wasserstein, with p=2

q R1

0 (F⁻¹−G⁻¹)²dµ

Table 1: The dendrograms were made using these five statistical distances. The formulas have been simplified for the sample space being R. F and G are the cumulative distribution functions, F⁻¹ and G⁻¹ their inverses, also known as quantile functions, f and g the corresponding probability densities and dµ is the Lebesgue measure.

2.5 Splines

In this study splines are simply piecewise polynomial functions with the pieces having the same degree. Some references (Villiers 2012; Süli and Mayers 2003) on the subject addi- tionally require smoothness conditions on the function. The following definitions otherwise agree with them. The degree of a spline is the degree of its polynomial pieces. Splines used for interpolation are created using a set of points, x₀<x₁< . . . <x_n, called knots, and the associated function values f₀, f₁, . . . , f_nat those points. A polynomial is chosen for each interval[x_i,x_i+1)so that the value of the spline atx_iis f_i. This condition can be satisfied with a piecewise constant function f(x) = f_i, ifx∈[x_i,x_i+1). If the degree of the polynomial pieces is higher, more free parameters are available to put additional constraints on the spline. For example, with linear pieces, the value of the piece can be set at each end of the interval, so the spline can be made continuous.

A common choice is to demand that the spline and all its derivatives up to some orderkare continuous. Each such continuity criterion requires one free parameter. Another common condition is to fix a function value and a derivative at each knot, which requires two free parameters.

In this study two kinds of splines are used, a continuous linear splines and cubic splines that are returned by the Matlab methodpchip. The method returns a continuously differentiable spline, with the slope at each point set so that the spline is monotonic whenever the data is (Moler 2004).

7

(12)

3 Data and methods

Section 3.1 of this chapter explains how, where and when the measurements were taken and shows a few example lines of data. One example is shown of the artifacts in the data.

Section 3.2 describes the three methods that were chosen to answer the research questions.

Two related methods apply spectral analysis and the third method hierarchical clustering with statistical distances.

3.1 Measurements

The present data was collected by Finnish radio stations relaying TV-broadcast signals. Sta- tions were operated by Digita Oy. The stations measure the instantaneous power of received signals at regular intervals. A map of the radio stations is shown in Figure 1. Relaying stations are used in places where a significant population is not served adequately by any of the main stations. The geographic distribution of the relaying stations therefore follows the demand for supplemental over the air TV-broadcasting in Finland. The data comes only from a subset of the relaying stations that were in use, because not all of them are equipped to measure received signals.

Most connections in the map represent several signals. Usually a relaying station is con- nected to a single main station, but there are some exceptions where a station relays signals from two main stations. The data contains measurements of the instantaneous power of a signal the relaying station receives. The resulting time series are labeled by the relaying station - main station pair and the signal that was relayed.

3.1.1 Details

There are 101 sub stations and 23 main stations. In total, there are 106 connections between a main station and a relaying station, with connection distances along the surface ranging from 14 km to 88 km. The signal frequencies range from 482 MHz to 690 MHz and all connections are line-of-sight connections. Because the stations usually relay more than one signal, the data contains 321 time series in total.

(13)

Figure 1: Stations and connections

(14)

The measurements are in units of dB relative to a power of 1 mW and with a resolution of 0.1 dB. Individual measurements were taken with a nominal 300 second interval. The measurements were collected between 2016-09-22 and 2017-03-15.

The measurements span 174 days, but the coverage is not continuous. There was an error in data recording between 2017-01-25 and 2017-03-09, which resulted in a total loss of measurements between those dates. In addition, there are smaller gaps in individual time series, ranging from a single missing measurement to a few days’ worth of data.

The following shows an example from the beginning of the data, which is simple text file with fields separated by commas. For the sake of brevity, 7 fields that are not used in the study have been omitted, but the lines are otherwise unaltered.

Node Name, Time Stamp (ms), Metric Value

"sodielec-tx-hapk-dvb1.hapk.digita.fi", 1474527783656, "-32.3"

"tx-kata-dvb3.dcn.yloj.digita.fi", 1474527788225, "-48.2"

"tx-ikaa-dcvb3.m2m.digita.fi", 1474527790209, "-48.5"

"rsnl-korp-dvb1.m2m.digita.fi", 1474527791700, "-36.4"

Each line is a single value from some time series. Each series has a unique "Node Name" and the stations and the signal can be parsed from that field. Strings "hapk", "kata", "ikaa" and

"korp" are unique identifiers of the relaying stations and the numbers 1 and 3 indicate which signal is in question. With some details about the radio network, the main station can then be inferred from the relaying station and the signal. As can be seen from this example, there is some minor variation in the format of the node name, but number of exceptions was small enough that a simple rule can interpret all of them correctly. The field "Time Stamp (ms)"

contains the measurement times in milliseconds and Unix time format. The field "Metric Value" contains the corresponding values of the time series.

3.1.2 Artifacts

Plotting individual series reveals that both the median value and the interquartile range vary in time. This is to be expected when doing measurements outdoors and for several months, but the step-like character is quite unnatural and is probably related to operating the measurement and broadcasting equipment. Figure 2 is an example plot of step-like artifacts

(15)

Figure 2: An example plot of artifacts in the data. The plot shows the measured power from a 538 MHz (green, higher) and a 634 MHz (blue, lower) signal. The changes look simultaneous, but they happen 30 minutes apart. The smaller plot is a detail from the morning of 2017-01-02, when the changes happen.

occurring almost simultaneously in two time series from the same connection. First, the two series follow each other, until the measured value in one drops by 53.0 dB on 2017-01-02, between 8:30 and 8:35. The value then increases by 57.0 dB with the next measurement and becomes much less stable for the remainder of the plot. The other series increases by 7.4 dB between 8:55 and 9:00 on the same day, but does not show any obvious change otherwise.

In some cases the timing of the step was indeed found to match the timing of maintenance work, but no precise enough log of changes in equipment and their settings was available to consider repairing the data. Without a comprehensive log of maintenance work, even plotting the entire time series and investigating each suspicious looking jump in the graph

11

(16)

leaves some ambiguous cases. Instead, it was decided to concentrate on methods that could be used regardless of the artifacts.

3.2 Analysis methods

The analysis methods need to be insensitive to the equipment and settings that were used to broadcast the signal and measure it. Otherwise, interesting physical phenomena could be hidden by more prominent effects that are not caused by differences in signal propagation.

The methods should also be suitable for data that has missing and extreme values and tolerate the artifacts mentioned in Section 3.1.2.

3.2.1 Detecting periodic variations

It is expected that a large part of the data is not periodic, but the data may contain periodic variations that are not obvious when the time series is plotted. The sample rate is rather uniform throughout the data, but there are values missing and the measurement times are not entirely regular. An analysis method which assumes uniform sampling is therefore not suitable on its own. Instead, interpolation is used to uniformify the measurement intervals.

The measurement closest to a regular grid of timestamps is used. See Babu and Stoica 2010 for a review of more sophisticated methods for spectral analysis of nonuniform data.

Periodograms were applied in two ways to detect periodic variations in the time series, if such variations are sufficiently prominent. First, to investigate all time series together and over the whole time span of the data. Secondly, the time series were kept separate and some time information was also preserved.

In both cases, the time series was windowed before transforming it. The window function used was chosen from Nuttall 1981. It is a four term generalized cosine window, with coefficients 0.338946, -0.481973, 0.161054 and -0.018027. This window was chosen because its sidelobes decay asymptotically at least as fast as 1/f⁵ and it has lowest maximum sidelobe among the four term windows with this property.

The logarithmic scale of the values does affect the transform, so values were converted to

(17)

Watts before transforming. The large arithmetic mean of the measurements did in practice tend to overshadow the lowest frequency components, so each piece was centered before applying the window function or computing the transform.

To represent all series together a time series of median values was used. The process used to create the median series was the following:

• Create uniform_timestamps starting from the first timestamp in the data with 300 s intervals, which is the nominal measurement interval. The last uniform timestamp is the one that is closest to the last timestamp in the data.

• For eachseriesin the data:

• Create an interpolated_series with the uniform_timestamps and the measurements in the series that are closest to those timestamps. This interpolation is called nearest neighbor interpolation. When the closest value is not within 300 s, use NaN as the interpolated value instead of a far away value.

• For eachtimestampinuniform_timestamps:

• Fetch thevaluesfrom each of theinterpolated_seriesattimestamp.

• If the valuescontain less than 10% NaNs, store their median inmedian. Other- wise, store NaN inmedian.

• Savemedianinmedian_seriesattimestamp.

• Convert the values inmedian_seriesto Watts.

The 10% limit was used to avoid interpolating over a month long period of missing data that occurs in all time series simultaneously. This gap is the only place where NaNs get stored in the median series. The gap is visible in the spectrograms of individual series, Figure 6 for example.

When the series contains randomness, the variance of the periodogram does not decrease with the number of samples (Welch 1967). Welch 1967 instead suggests splitting the series into pieces with 50% overlap and then averaging the periodograms of the pieces. Welch shows that the expected value of this estimate is the true periodogram and the variance of the estimate is inversely proportional to the number of pieces, but does not decrease with the number of samples. This process was used to calculate the periodogram from the median series:

13

(18)

• Split themedian_seriesinto 16 day longpieces, with 50% overlap. Pad the last piece with NaNs, if necessary.

• Forpieceinpieces:

• Ifpiececontains more than 10% NaNs, skip it.

• Replace NaNs in thepiecewith the nearest value.

• Center thepieceby subtracting its arithmetic mean.

• Multiply elementwise thepieceand the window function.

• Double the length of thepieceby appending a sequence of zeros.

• Calculate thetransform, defined in Equation (2.1), of thepiece.

• Store the squared magnitudes oftransforminperiodograms.

• Use arithmetic mean to calculate the mean periodogram from theperiodograms.

The length of the piece is a tradeoff between longer or more numerous pieces. One cycle of slow variation needs to fit inside the piece for it to be visible, but there needs to be enough pieces for averaging to provide any benefit. Several piece lengths were tried and the choice of 16 days already produces just 14 periodograms. Longer pieces were not worth the reduction in the number of periodograms.

The other approach for detecting periodic variation is to make a spectrogram from each of the time series. Each horizontal line in a spectrogram is a periodogram from a piece of the series. The start time of a piece is represented by the vertical axis, so that the spectrogram shows how the periodogram changes with time. The spectrograms were visually inspected for common and exceptional features. 11 time series contained too few measurements for this approach.

The nearest neighbor interpolation was again used to make each series uniformly sampled, but this time it is not necessary to avoid the long gap in the data, so no exception is made even if the nearest measurement is far away. The long gap appears in the spectrogram as the transform of the interpolating function, which is a piecewise constant function in this case. Transform of the constant function is zero, so the long gap results in periodograms with zero output in all frequencies. In practice, floating point arithmetic makes the results differ slightly from zero. The center of a long gap is visible in the spectrogram, because the interpolating value changes there. The periodogram there shows the transform of a rectan-

(19)

gular function. Short gaps only change the results slightly, because the interpolated function is still close to the function with no missing values.

Like gaps in data, outliers are also visible in the spectrograms. Exceptionally large and small values behave approximately like the Dirac delta function, whose transform has constant magnitude in all frequencies. A brief period of intense variation in the series could look similar.

Each time series was split into equal sized pieces with 50% overlap and a periodogram was calculated for each of the pieces. This process was used to create a spectrogram from a time series:

• Convert the measurements in the series to Watts.

• Create uniform_timestamps starting from the first timestamp in the series with 300s intervals, which is the nominal measurement interval. The last uniform timestamp is the one that is closest to the last timestamp of the series.

• Interpolate the series using nearest neighbor interpolation and uniform_timestamps.

Store the result inuniform_series.

• Splituniform_seriesinto 512 sample longpieces, with 50% overlap. Discard the samples at the end of the series that do not fit into the last full piece.

• For eachpieceinpieces:

• Center thepieceby subtracting its arithmetic mean.

• Multiply elementwise thepieceand the window function.

• Calculate thetransform, defined in Equation (2.1), of thepiece.

• Store the squared magnitudes oftransforminspectrogram.

Like before, the length of the piece determines frequency resolution. Time resolution also depends on the piece length, because the periodograms are separated in time by the samples that lie between their start times. To choose a piece length a number of powers of two were tried. The features in the spectrograms were easiest to discern with 512 sample long pieces.

15

(20)

3.2.2 Clustering value distributions

The time series were clustered using approximations of their value distributions. To calculate the distances of the value distributions of the time series, cumulative distribution functions, quantile functions and probability density functions need to be approximated based on the data. Approximate values of these functions at each multiple of the measurement resolution were found by counting the number of times each value appears in the series. For the probability density function finite differences of the counts were used. At points other than multiples of the resolution, the functions were interpolated using splines.

All of the distances would be dominated by the different locations of the value distributions, but in this case the location is mostly dependant on the signal travel distance and the equipment and settings used. Studying these effects was not the goal of this study, so all splines were shifted so that their location is the same. Shifting allows the statistical distances to be sensitive to other properties of the distribution.

The sample median is an obvious candidate for a location parameter to use for shifting, but because the measurements are rounded to the measurement resolution, the sample median is not accurate enough. Instead, the shift is chosen so that the value of the spline approximating the cumulative distribution is 0.5 at 0. This shift was used for all three splines created for a time series.

Like was earlier noted in Section 3.1.2, on some time-series the location of the distribution changes abruptly one or several times. For these series a single shift will be insufficient to counteract the dominance of the location parameter. Depending on the time spans and the magnitudes of these changes, the spline can be far away from its proper shape and location.

In the cases where the distortion is severe, it is to be expected that the distance measures will classify these time series as outliers. The resulting dendrograms should not otherwise be adversely affected by these artifacts and their presence in the data can be tolerated.

If the distortion is only slight, then it is conceivable that such a time series will get misclassi- fied. For example, two narrow peaks resulting from half of the measured values being shifted up, could appear after rounding very similar to a distribution with a naturally wider peak. In fact, these cases are impossible to separate using only the cumulative value counts. This

(21)

process was used to create the splines approximating the cumulative distribution functions:

• For eachtime_seriesindata:

• Store invaluesa regular grid of values separated by the measurement resolution, starting from the the largest value not intime_seriesand ending in the maximum oftime_series.

• For every value invaluesstore incumulative_proportionsthe proportion of data points intime_seriesless than or equal to the value.

• Use the Matlab methodpchipwithvaluesandcumulative_proportionsto create a monotonic cubicspline.

• Solvespline(x₀) =0.5 with the Matlab method fzeroand shift spline’s knots by−x₀.

• Append a constant 0 segment from -max_double to min(values)tospline.

• Append a constant 1 segment from max(values)to max_double tospline.

A time series need not contain all the values between the minimum and the maximun value, so the calculated proportions are constant in some ranges. It was therefore decided to allow jump discontinuities in the spline approximating the quantile function. For each strictly monotonic range, a monotonic cubic spline was fitted and these splines were then patched together. This is the process in pseudocode:

• Get thecumulative_proportionsand the shiftedvaluesfortime_series.

• Initializesplineas empty.

• Breakcumulative_proportionsinto maximally long, strictly increasingsequences.

• For eachsequenceinsequences:

• Create a monotonic cubicspline_pieceby callingpchipwith thesequence and the corresponding values.

• Append thespline_piecetospline.

It is important that the spline representing the probability density function is non-negative.

Actual densities are non-negative, and negative values would also complicate implementing the Hellinger distance. The three finite differences used therefore share a useful property,

17

(22)

they are non-negative with non-decreasing data. Even with non-negative values, a carelessly chosen spline could overshoot and become negative. Unlike the monotonic cubic splines used in other cases, linear splines were chosen because the derivatives calculated from some time series with a wide range of values or a small number of measurements have large and numerous oscillations. This was the process used to create splines representing probability density functions:

• Get the shiftedvaluesandcumulative_proportionsthat were calculated fortime_series.

• Use central difference¹to approximate the rates_of_change. The central difference cannot be used with the first and last point, so use forward²and backward³ differences at those points.

• Create a linearsplinewithvaluesandrates_of_change.

• Append a constant 0 segment from -max_double to min(values)tospline.

• Append a constant 0 segment from max(values)to max_double tospline.

Kolmogorov distances, defined in Table 1, were calculated using the splines representing cumulative distribution functions. The splineF−Gwas calculated by subtracting polynomial coefficients while keeping track of the interval ends. Its maximum absolute value was found analytically. The maximum cannot occur in points where the absolute value|F−G|fails to be differentiable, because the function’s value is zero in those points. The maximum must therefore occur in a point where the derivative of F−Gis zero. The derivative spline was calculated from F−G and the roots of the polynomial pieces were found using Matlab’s methodroots, which solves polynomial equations numerically. The value of|F−G|was checked in each real root found in the appropriate interval.

The other distances were implemented by numerically integrating the function in question, with splines substituted for the functions f, g, F,G, F⁻¹ andG⁻¹. The integrals were calculated using Matlab’s numerical integration methodintegral. In all cases the function to be integrated is non-zero on a finite interval, so it is only necessary to use a finite integra-

1. f⁰(x)≈(f(x+h)−f(x−h))/(2h) 2. f⁰(x)≈(f(x+h)−f(x))/h

3. f⁰(x)≈(f(x)−f(x−h))/h, withhbeing the measurement resolution

(23)

tion interval. The union of the splines’ knots was given as hints for the integrator to at least split the integration interval on those points. Outside of those points the function is usually smooth. With some distances and on some intervals the integrator even yields theoretically exact results because the integrated function is a low degree polynomial. Outside of the interval breaks, the only non-smooth points occur with the p=1 Wasserstein distance, where F⁻¹−G⁻¹changes sign.

3.2.3 Visualizing clustering results

Hierarchical clustering produces a tree and graphs of these trees are called dendrograms. In this case the number of nodes in the tree is large enough that a typical tree graph showing the entire tree would be difficult to interpret. Instead, the trees are shown in a condensed manner. See Figure 10 for an example. Each rectangular area of uniform color represents a cluster. The left edge of the figure has 321 small rectangles representing the 321 time series in separate clusters. Moving right, these tiny clusters are merged into larger clusters. The horizontal location of the merge corresponds to the distance the two merged clusters had. The color of the more numerous cluster is carried on to the parent. The right edge corresponds to the distance which the two last clusters had when they were merged into a single cluster, and is incidentally the maximum distance any two objects had. In Figure 10 for example, this distance was about 0.76 and the larger of the siblings was created by merging two clusters whose distance was about 0.58.

Some clusters from each dendrogram were chosen and the splines representing their value distributions were plotted to see what value distributions are present in the data. The plots of the splines are overlaid with transparency so that they can be compared with each other.

Fainter or stronger lines were used depending on the number splines, so the intensity of the color is comparable only with different regions of the same image. Clusters were heuristi- cally chosen from the dendrograms based on their prominence and size. To get a comprehensive picture, clusters were also chosen so that for each measure, most of the time series belong to at least one of the plotted clusters.

19

(24)

4 Results and discussion

The first three sections of this chapter display results obtained with the three methods ex- plained in Section 3.2. It was impractical to include all spectrograms in the chapter because so many were created. Instead the examples in Section 4.2 represent the whole set and high- ligth findings. The chapter concludes with Section 4.4, which discusses the implications of the results.

4.1 Periodogram of the median series

The first method used for detecting periodic variation produces a single periodogram that is an arithmetic mean from several periodograms. Several window lengths were tried and all tried window lengths result in graphs where the lowest frequency component is the most dominant one. Using as long windows as possible would therefore be desirable to resolve the dominant low frequency content. It is still necessary to maintain a reasonable number of periodograms to have some benefit from averaging them. The graph resulting from the compromise, which uses 14 windows with each containing 16 days is shown in Figure 3.

The short span of measurements available after the error in data recording is only a few days long and could not be used with this window length.

The overall shape of the periodogram is roughly linear in the log-log plot, so a simple power law describes it approximately. Somewhat prominent peaks for variations once per day and twice per day can be seen in the periodogram. There is also a less convincing peak at a bit slower than once per hour.

4.2 Spectrograms

Showing spectrograms of all the 321 time series is not practical. Instead, examples are chosen to explain and support the findings. Note that unlike in Figure 3, the frequency axis is linear instead of logarithmic. Again, the variation is generally spread around all frequencies, with some emphasis on the low frequencies. There are however some features overlaid the

(25)

Figure 3: Periodogram of the median time series. The peridogram is the mean of 14 different periodograms, each calculated from a 16 day long segment of the median time series. Note that both axes are logarithmic. Since each window was zero-padded to twice its length, the lowest plotted frequency is half of the actual frequency resolution. The maximum plotted frequency is the highest possible, half of the sampling frequency.

wideband noise. There are periods of higher or lower variation in the signal strength which appear as brighter or dimmer horizontal bands, but there is no obvious pattern to these bands.

The bands often have sharp edges, so the transitions fit inside one piece, which is about 43 hours in length.

The spectrograms from different multiplexes but the same connection are very similar. De- tails visible on one spectrogram are almost always detectable on the others, although they may be somewhat less prominent. See Figures 4 and 5 for a comparison. No relationship between the frequency of the multiplex and the prominence of the features can be inferred.

21

(26)

Figure 4: Spectrogram of the instantaneous signal power measured. Frequency of the signal was 570 MHz and the link distance is 66 km

With the piece length used, the variation once or twice per day that can be seen in Sec- tion 4.1 corresponds roughly to the lowest¹ frequency available in the spectrograms. The spectrograms do offer some support for that result, since a peak at the right location is inter- mittently visible in some spectrograms, but the peak rarely persists for the whole measured duration. Spectrograms therefore suggest that the daily variation is not present all the time and in all connections.

1. Once per day is about 0.01 mHz

(27)

Figure 5: Spectrogram of the measured power in the same connection as Figure 4 with a 682 MHz signal instead.

The spectrograms can be categorized into two or three groups based on detectable periodic variations, other than the possible daily component. The boundaries of the groups are not strict and of course subjective. The most numerous group is simply the case where no features other than already discussed stand out. Roughly two thirds of the time series and connections fall into this category. Figure 6 is an example from this group.

23

(28)

Figure 6: Spectrogram of the instantaneous signal power measured. Intermittent daily variation is visible (approximately the second bin from the left). Frequency of the signal was 546 MHz and the link distance was 28 km.

Roughly a third of the time series have additional peaks in their spectrograms. The peak’s location varies with time as well as between different connections. The peak is found between 0.1 mHz and 1.4 mHz, which is close to the maximum frequency detectable with the sampling rate in the data. Depending on the connection, the peak can appear in a fixed location, have a slow increase or decrease in frequency or its location can meander without an obvious rule. The peak may also disappear for some time. See Figures 4, 5 and 7 for examples.

(29)

Figure 7: Spectrogram of the instantaneous signal power measured. A slowly up-chirping peak is visible. Frequency of the signal was 594 MHz and the link distance was 65 km.

Most of the connections with a peak show just one peak, but about a third of them have another, less prominent peak with twice the frequency. Figure 8 is an example from this group. The second peak seems to follow the movements of the first peak. There is one or possibly few connections with three peaks, the third one located at three times the frequency of the lowest frequency peak. The clearest example is shown in Figure 9.

25

(30)

Figure 8: Spectrogram of the instantaneous signal power measured. Two peaks are seen meandering together. Frequency of the signal was 530 MHz and the link distance was 31 km.

4.3 Dendrograms and clusters

The dendrograms created using the five different statistical distances in Table 1 are in Fig- ures 10, 11, 12, 13 and 14. Some clusters are marked in each dendrogram with numbers and the splines corresponding to the marked clusters are shown in Figures 15 and 16. The plotted functions are the splines that were used by the distance measures in clustering: probability density for Hellinger and total variation distances, cumulative distribution function for Kolmogorov and the quantile function for Wasserstein distance.

(31)

Figure 9: Spectrogram of the instantaneous signal power measured. This is the spectrogram with the clearest three peaks. Frequency of the signal was 498 MHz and the link distance was 55 km.

Some of the shown clusters do not look like well defined collections of similar distributions.

For clusters 1, 17 and 21 a better defined subset was found from the dendrogram. These subsets are clusters 2, 18 and 22 respectively. In the interest of brevity only these three, the best examples are shown. The dendrograms do not suggest a better defined subset for all such unclear clusters.

27

(32)

Figure 10: Dendrogram using hellinger distance, clusters 1-6 marked.

Overall, all five distance measures produce a cluster or clusters that contain narrow symmetric distributions. All distances except the Kolmogorov distance produce one such cluster.

They are clusters 3, 8, 19 and 25. Kolmogorov distance produces two such clusters, 13 and 16. The distributions in those clusters look similar and their distance is not that high. How- ever, cluster 16 gets merged with a cluster that is far from 13. After this, complete linkage keeps these clusters separate for most of the dendrogram. In addition to a cluster of narrow, symmetric distributions, cluster 7 from the Hellinger distance contains wider, but still symmetric distributions.

Another surprising clustering result is that even though the peaks of the distributions in cluster 2 are a bit wider, the distributions in cluster 6 look rather similar. However, the these clusters stay separate throughout the dendrogram. A merge of a cluster of two outliers into

(33)

Figure 11: Dendrogram using total variation distance, cluster 7-11 marked.

cluster 2 and complete linkage kept these clusters separate. Except for the symmetric looking narrow distributions, all other clusters have some skewness. Even the narrow distributions may have slight skew that is not visible, because the peak is roughly the same width as the measurement resolution. There are clusters with positively skewed distributions and clusters with negatively skewed distributions, but positive skew seems to be more common. The clusters with positively skewed distributions are also better defined and have more members.

Positive skew is found in clusters 1, 2, 6, 9, 12, 15, 17, 21, 22 and 24. Clusters 11, 18 and 20 show only slight positive skew.

Since the lognormal distribution is also positively skewed, one could suggest that the positive skew is caused by the logarithmic units and the same plots would look like normal distributions and therefore symmetric when using Watts instead. In fact the situation is reversed: if a

29

(34)

Figure 12: Dendrogram using Kolmogorov distance, clusters 12-16 marked.

distribution plotted in logarithmic units looks like a normal distribution, then that distribution plotted in non-logarithmic units looks lognormal. A lognormal looking plot with logarithmic units, like here, would look normal with doubly logarithmic units.

Negative skew is found in clusters 4, 5, 10, 14 and 23. There is no cluster plotted with generally negatively skewed distributions forp=1 Wasserstein distance, while such clusters were found for all other distances. The dendrogram in Figure 13 has no candidate cluster that is large enough to contain all the negatively skewed distributions.

Clusters with negative skew have more oscillations in their distributions relative to equally wide clusters with positive skew. Comparing clusters 9 and 10 for example, members of 10 have more oscillations on their skewed side, even though the distribution is a bit wider in cluster 9. Such oscillations can suggest that artifacts are distorting the cluster. However,

(35)

Figure 13: Dendrogram using Wasserstein distance with p=1, clusters 17-20 marked.

some members of cluster 10 do not show any signs of containing artifacts and also do not have any oscillations in their probability density functions. While some of the members might cluster differently with better data, which could reduce the oscillations, the cluster shape in general appears genuine.

4.4 Discussion

The first research question was whether there is periodic variation in the measured signal power. Based on the periodogram of the median series and the spectrograms of the time series the answer is that, in general, there is weak periodic variation. This result is in conflict with the commonly used model (ITU-R 2019; Witvliet et al. 2011) where the measured power follows some probability distribution. To be fair, this non-random component is a minor part of the variation.

31

(36)

Figure 14: Dendrogram using Wasserstein distance with p=2, clusters 21-25 marked.

A large enough fraction of the time series must show daily variation to make the median value have daily variation, but such variation cannot be inferred to be present in all connections. In fact, the spectrograms suggest that the variation is present only in some of the connections and only some of the time.

It is already known that many properties of the ground and the atmosphere change how UHF signals propagate, so the daily variation that was detected is likely caused by diurnal changes in these properties. It is worth noting that the Sun will not be in the same phase in different connections or even at different points of a single connection. Also, the location of the Sun is a more complicated function of time than a one cycle per day model.

The faster variation found between 0.1 mHz and 1.4 mHz in about third of the time series does not have such an obvious candidate cause. The peak in some spectrograms persists

(37)

Hellinger cluster 1 Hellinger cluster 2 Hellinger cluster 3 Hellinger cluster 4

Hellinger cluster 5 Hellinger cluster 6 Total variation cluster 7 Total variation cluster 8

Total variation cluster 9 Total variation cluster 10 Total variation cluster 11 Kolmogorov cluster 12

Kolmogorov cluster 13 Kolmogorov cluster 14 Kolmogorov cluster 15 Kolmogorov cluster 16 Figure 15: Overlaid plots of cluster members. Clusters 1-6 are from the dendrogram in Figure 10, clusters 7-11 from Figure 11 and 12-16 from Figure 12. Clusterings were done using the logarithmic values in the data, so the horizontal unit is dB (1 mW). The vertical unit is dB⁻¹in plots with probability densities and dimensionless with cumulative distributions.

33

(38)

Wasserstein1 cluster 17 Wasserstein1 cluster 18 Wasserstein1 cluster 19 Wasserstein1 cluster 20

Wasserstein2 cluster 21 Wasserstein2 cluster 22 Wasserstein2 cluster 23 Wasserstein2 cluster 24

Wasserstein2 cluster 25

Figure 16: Overlaid plots of the quantile functions of the cluster members. Clusters 17-20 are from the dendrogram in Figure 13 and clusters 21-25 from Figure 14. Clusterings and the plots use the logarithmic values in the data, so the vertical unit is dB.

throughout the data, but it does not necessarily maintain its location, so the cause should not have a constant frequency either. The peaks in the same spectrogram move together, which suggest a common origin. In the spectrograms with two peaks, one peak is at twice the frequency of the lowest frequency peak and in the one spectrogram with three peaks, the highest frequency peak is at thrice that frequency. These peaks could therefore be a part of a series of peaks located at each integer multiple of the frequency of the most prominent peak and diminishing in amplitude. All but the first few peaks could be too weak to be visible in the spectrograms.

(39)

Since the lowest frequency is also the most prominent, measurements that cover a longer time period could produce interesting results that were not visible with this data. Such increases are costly, because doubling the time halves the lowest available frequency. For example the graph in Figure 3 would extend by one point to the left. Studying those lowest frequencies requires measurements from a significantly longer time period, but it is not necessary for the measurements to be as frequent. Artifacts in the data also become a bigger problem with longer time scales. If they occur equally often, there would be fewer pieces of the time series without artifacts.

The second research question was what is typical behavior of the received signal power in this radio network. The clusterings reveal diverse measurement distributions. If a single theoretical family of distributions is to be used to describe them, that family needs to include distributions with different locations, widths and skewness(both negative and positive).

Using complete linkage (King 1967) with this data turned out to be a liability. Initially, there was hope that a time series which contains artifacts would be classified as an outlier by the clustering algorithm, but they did get included in clusters and later kept similar looking (to human eye) clusters separate. Some distance measures assigned higher distances to the outliers and suffered less than others, but no measure was immune. A different clustering algorithm could be tried as an alternative. Since splines can be averaged, the average of the cluster’s splines could serve as the clusters centroid for centroid clustering.

Despite the artifacts, which could change individual distributions and also confuse the clustering algorithm, the clustering results are not useless. Even if some cluster members show evidence of containing artifacts, the overall shape of the distributions in the cluster is not compromised and can be seen from the plots. If the clustering algorithm had been confused by artifacts, this could also be seen from the plots of the clusters. Also, the five clusterings were affected by artifacts in different ways, so the effect can be detected by comparing the clustering to the four others.

This study has demonstrated that one can extract meaningful information from data that has been collected by a large number of relaying stations in addition to their normal operation.

The measured signals are from stations whose primary purpose was not research. The artifacts in the data are a natural consequence of this, but it was still possible to obtain results despite them complicating the analysis. 35

(40)

5 Conclusion

A dataset of measured UHF signal powers was analyzed in this thesis. The dataset had several challenging aspects. Most of the time series it contains are long and there is a significant number of them, but the most significant difficulty was the artifacts in the data.

Despite the difficulties, the two research questions were answered. The first question was whether there are periodic variations in the propagation of UFH signals. Weak periodic components were indeed detected with spectral analysis. Variation once per day was the most prominent of these. The second question was what is typical behavior of the received signal power. With hierarchical clustering it was determined that there is a variety of unimodal value distributions that are typical in the data. When plotted using logarithmic units, the distribution can be both narrow or wide, and have a positive or negative skew.

Further research using the difference of consecutive values could alleviate some of the prob- lems with artifacts, but differencing would complicate the analysis. The results would no longer be about a distribution of values, but instead about the distribution of value differences. More generally, this study did not comprehensively explore the dataset and only two research questions were chosen. For example, the timing of events in the data was largely ig- nored. The data could be also be combined with other data to investigate effects of geography and the weather, for example.

(41)

Bibliography

Babu, Prabhu, and Petre Stoica. 2010. ”Spectral analysis of nonuniformly sampled data – a review”. Digital Signal Processing20 (2): 359–378. ISSN: 1051-2004. https://doi.org/10.

1016/j.dsp.2009.06.019.

Gionis, Aristides, Heikki Mannila, and Panayiotis Tsaparas. March 2007. ”Clustering Ag- gregation”.ACM Trans. Knowl. Discov. Data(New York, NY, USA) 1, number 1 (). ISSN: 1556-4681. https://doi.org/10.1145/1217299.1217303.

Harris, F. J. 1978. ”On the use of windows for harmonic analysis with the discrete Fourier transform”. Proceedings of the IEEE 66 (1): 51–83.ISSN: 0018-9219. https://doi.org/10.

1109/PROC.1978.10837.

ITU-R. 2017.Recommendation ITU-R P.530-17: Propagation data and prediction methods required for the design of terrestrial line-of-sight systems.

. 2019. Recommendation ITU-R P.1057-6: Probability distributions relevant to ra- diowave propagation modelling.

Jain, A. K., M. N. Murty, and P. J. Flynn. 1999. ”Data Clustering: A Review”.ACM Comput.

Surv. (New York, NY, USA) 31 (3): 264–323. ISSN: 0360-0300. https://doi.org/10.1145/

331499.331504.

Jain, Anil K. 2010. ”Data clustering: 50 years beyond K-means”.Pattern Recognition Letters 31 (8): 651–666.ISSN: 0167-8655. https://doi.org/10.1016/j.patrec.2009.09.011.

King, Benjamin. 1967. ”Step-Wise Clustering Procedures”.Journal of the American Statis- tical Association62 (317): 86–101. https://doi.org/10.1080/01621459.1967.10482890.

Lange, Tilman, Volker Roth, Mikio L. Braun, and Joachim M. Buhmann. 2004. ”Stability- Based Validation of Clustering Solutions”.Neural Computation 16 (6): 1299–1323. https:

//doi.org/10.1162/089976604773717621.

37

(42)

Luxburg, Ulrike von. 2010. ”Clustering Stability: An Overview”.Foundations and Trends in Machine Learning(Hanover, MA, USA) 2 (3): 235–274. ISSN: 1935-8237. https://doi.org/

10.1561/2200000008.

MacQueen, J. 1967. ”Some methods for classification and analysis of multivariate obser- vations”. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics,281–297. Berkeley, Calif.: University of California Press.

https://projecteuclid.org/euclid.bsmsp/1200512992.

Meno, F. I. 1977. ”Mobile radio fading in Scandinavian terrain”.IEEE Transactions on Ve- hicular Technology26 (4): 335–340. https://doi.org/10.1109/T-VT.1977.23704.

Moler, Cleve B. 2004.Numerical Computing with Matlab.Society for Industrial / Applied Mathematics. https://doi.org/10.1137/1.9780898717952.

Nuttall, A. 1981. ”Some windows with very good sidelobe behavior”.IEEE Transactions on Acoustics, Speech, and Signal Processing29 (1): 84–91.ISSN: 0096-3518. https://doi.org/

10.1109/TASSP.1981.1163506.

Oppenheim, Alan, and Ronald Schafer. 1999.Discrete-time signal processing.Upper Saddle River, N.J: Prentice Hall.ISBN: 0-13-083443-2.

Süli, Endre, and David F. Mayers. 2003.An Introduction to Numerical Analysis.Cambridge University Press. https://doi.org/10.1017/CBO9780511801181.

Villiers, Johan de. kirjoittaja. 2012.Mathematics of Approximation.406. Mathematics Text- books for Science and Engineering. Paris: Atlantis Press. http://dx.doi.org/10.2991/978-94- 91216-50-3.

Welch, P. 1967. ”The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms”.IEEE Transactions on Audio and Electroacoustics15 (2): 70–73.ISSN: 1558-2582. https://doi.org/10.1109/TAU.

1967.1161901.

(43)

Witvliet, B. A., P. W. Wijninga, E. van Maanen, B. Smith, M. J. Bentum, R. Schiphorst, and C. H. Slump. 2011. ”Mixed-path trans-horizon UHF measurements for P.1546 propagation model verification”. In 2011 IEEE-APS Topical Conference on Antennas and Propagation in Wireless Communications,303–306. https://doi.org/10.1109/APWC.2011.6046762.

39

Exploring a large dataset : typical behavior of UHF signal propagation