Individual tree detection - Horvitz-Thompson-like estimators based on stochastic geometry for f

ITD, also sometimes called individual tree crown (ITC) detection or delineation in the airborne remote sensing context, is a methodology that tries to find individ-ual tree objects from remote sensing data and estimate tree-level attributes for the detected trees. Several methods have been developed both for ALS and TLS data.

In ALS-based ITD the tree objects are usually tree crowns, whereas in TLS-based ITD the tree objects are usually tree stems. It should be noted that usually in the literature ITD refers to the ALS-based methods, and the term is not used in conjunc-tion with TLS. It is the choice of the author to unite the task of finding tree objects from different sources of remote sensing data under one umbrella, ITD, as the term clearly characterizes the methodology and its independence of the data source. Let us first concentrate on the ALS case.

One quite simple image analysis based method, presented by [21] and [22], takes a canopy height model formed by interpolating the laser data – more specifically, the first and only return heights – and finds local height maxima through low-pass filtering. These local maxima are interpreted as locations of trees. After this, the canopy height model is segmented into tree crown objects around the local maxima via watershed segmentation. An estimate of crown diameter is achieved by taking the maximum of the diameters of the corresponding segment in four cardinal directions passing the crown location. This method has been used inIIfor benchmarking.

Quite different approach is presented by Lähivaara et al. [23], the method used inIand IIfor ITD. This method models a tree crown as a parametric, rotationally symmetric geometrical object. The objects are parametrized by the horizontal coor-dinates of the tree crown centre point, crown radius, crown height, the lower limit of the living crown, and the crown shape. It is assumed that the ALS data is produced

by a generative model that consists of a collection of these geometrical objects and error terms related to the porosity of the canopy and the related penetration of the laser pulses, and the fact that tree crowns are not rotationally symmetric, smooth geometrical objects. Based on this model, the problem can be studied in the frame-work of Bayesian inverse problems [24]. Field measurements and allometric models of [25] can be used to form a prior distribution for the model parameters. The fi-nal product is the maximum a posteriori (MAP) estimate of the model parameters, which represents a collection of geometrical objects representing the tree crowns that can be detected from the ALS data. An example of ITD with this method is shown in Figure 3.1.

Another 3D method is presented in [26], where the laser returns from a tree crown are modelled directly as a probability distribution and the returns from an area as a mixture of the tree-specific distributions within the area. Other methods, mostly working in similar vein to [21], search for local height maxima but addition-ally use allometric knowledge [27], a priori knowledge of expected crown size or tree density [28,29], or multilayered approach [30] to improve the detection of trees.

Recently, ITD methods based on deep learning have also emerged, see e.g. [31]. It should be noted that ITD algorithms are quite active area of research, and there are many algorithms in addition to the examples given here. For example, Zhen et al. [32] reviewed 212 articles of research literature related to ITD from 1990 to 2015, 92 of which concentrated on algorithm development.

For TLS, most of the ITD methods utilize geometrical models. For example, Lovell et al. [33] first takes a slice of the TLS data at certain height and classifies measurements as belonging to tree stems if their apparent reflectance exceeds a threshold value. Angular point intensities are used to infer stem locations. Circu-lar stem models relating reflectance values to the stem diameter are then applied.

Liang et al. [34] first classifies TLS data points as either belonging to tree stems or not based on the properties of local covariance matrices of point locations. Then, the stem points are divided into clusters, each cluster containing points from one stem, based on distances between the points. Cylindrical models are fitted to the point clusters. Raumonen et al. [35] uses what they callquantitative structure mod-els, which represent trees as hierarchical collections of cylinders or other shapes. In addition to stem modelling, these models are also able to represent the branches.

Ravaglia et al. [36] uses the Hough transform and growing open active contours, also called snakes, to model trees as tubular shapes consisting of series of 3D circles.

InIII, which considers estimation in the TLS case, no actual TLS data was used, and hence none of the aforementioned ITD methods were used. This study was purely a simulation study, exploring the behaviour of Horvitz–Thompson-like estimation in different spatial patterns and comparing the estimator to two other previously proposed estimators.

4 STOCHASTIC GEOMETRY

This chapter gives a short introduction to the tools of stochastic geometry that are central to the calculation of detection probabilities presented in I–III. First, point processes, models of random locations and randomly located objects, are considered.

Second,mathematical morphology, a methodology of transforming geometrical struc-tures to obtain more information on those strucstruc-tures, is presented.

4.1 POINT PROCESSES

This section is based on [12] and [37]. Point processes are models for irregular point patterns. Very general definitions could be formulated, but we constrain ourselves and only consider point processes defined in a compact subset of the planeS⊂^R²^. Apoint processis a random variableXthat generates values fromN(S), the set of all finite discrete subsets ofS. One realisation ofXtakes the formx ={x₁,x2, . . . ,x_N}^, where x_i are points in S, x_i 6= x_j for all i 6= j, and the number of points N can be random. Some processes can be characterized via a probability density function describing the probabilities of different point configurations. However, for some point processes the density function cannot be given in analytical form, and in some cases it is more convenient to describe the process via some other properties. A point process is called stationary if the properties of the process do not depend on the location of the window W ⊂ S where the process is observed. Expressed differently, the distribution of the process is translation-invariant. An example of a nonstationary process would be a process where the number of points increases as we move from one side of S to the other. A point process is called isotropicif it is rotation-invariant; in other words, the window W can be rotated an arbitrary amount, and the observed properties of the process in the rotated and unrotated window should be the same. In this thesis it is always assumed that the underlying forest process is stationary and isotropic.

An important point process model is the (homogeneous)Poisson process. Some-times patterns generated by the Poisson process are said to exhibit complete spatial randomness(CSR). The Poisson process is characterized by two properties: first, the number of points located in a bounded Borel setB⊂Sfollows the Poisson distribu-tion with expected valueλ|B|^{, where}|^.|is the area operator andλ>0 is called the intensity parameter of the process, and second, the numbers of points inkdisjoint Borel sets are independent of each other. These properties lead to the points of the process being independently located inS, so that there are no interactions between them. The Poisson process is especially important as a null model when testing if a pattern exhibits interactions between the points.

A special case of a point process is the single random point. For a random point x that is uniformly distributed in S, the probability that it hits a subset A is P(x ∈ A) = ^|A|_|_S_|. In other words, the probability is directly proportional to the area of the subset. The single random point has a strong connection to the Poisson process. Poisson process can be thought of as a collection of uniformly distributed random points.

In the literatureλis used generally to denote the intensity or global point den-sity of a stationary point process, the expected number of points generated by the process per unit area. Mathematically, this can be expressed as

λ= ^E[N(S)]

|S| ^, ^(4.1)

where N(S)is the number of points in setS.

4.1.1 Characterizing point patterns and processes

Point patterns and processes are usually roughly categorized to three groups based on the interactions they exhibit. First category is the CSR, with no interactions.

Second category is regularpatterns. Regular patterns have less variation in num-bers of points located in disjoint subsets of S than CSR and there are less of small inter-point distances than in a CSR pattern. An example of a regular process is a hard core process, where all the points are at least distance rapart from each other.

Third category isclusteredpatterns. These have more variation in numbers of points located in disjoint subsets ofSthan CSR and there are more points with small inter-point distances than in a CSR pattern. An example of a clustered pattern is a inter-point pattern generated by first generating parent points via a Poisson process and then generating the actual points around these points in some local neighbourhoods, e.g.

limiting the distance from a child point to a parent to be less or equal to some upper limitr.

Point patterns can exhibit different spatial characteristics at different scales. For example, tree centre points will always be some minimum distance apart, deter-mined by stem size, but the trees could still be growing in groups. The point pattern formed by the tree centre points would hence exhibit regularity at short distances and clustering at longer distances. Due to this scale-dependence point patterns are often characterized by functional summary statistics, statistics that are functions of distance. Two commonly used statistics are (Ripley’s) K-function and (Besag’s) L-function.

TheK-function compares the local point density in the neighbourhood of a typ-ical point of the process to the global point density:

K(r) = ^E[N(B(o,r)\ {o})]

λ , r≥^0, ^(4.2)

where

B(x,r) ={y∈^R²^:||x−y|| ≤r}^. ^(4.3) is the x-centred disc of radius r. Here it is assumed that the typical point of the point process is located on the origino, which is possible due to stationarity. For the Poisson process

K(r) =πr², (4.4)

because from the definition of the Poisson process it follows that

E[N(B(o,r)\ {o})] =λπr². (4.5) Clustered processes have more points aggregated near a typical point than the Pois-son process, which leads to larger local point densities thanλand

K(r)>πr². (4.6)

On the other hand, regular processes have more isolated points than the Poisson process and

K(r)<πr². (4.7)

TheK-function can be estimated from an observed point pattern by calculating the mean number of points in the neighbourhoods of all the points, divided by an estimate of λ. However, the limited size of observation window W can produce bias to the estimated function – the full neighbourhoods of points near the window boundary are not observed. Hence, edge-effect corrections are necessary. The size of Walso affects the maximum distancerat which theK-function should be estimated – for example, in a circular window W the maximum r commonly used is half of the radius of the window. Estimators of theK-function with several edge-effect corrections and sensible defaults of maximum distances can be found in the package spatstat[38], implemented in R [39].

TheL-function is defined via theK-function as L(r) =

rK(r)

π , r≥^0. ^(4.8)

Hence, in the Poisson process case L(r) = r, and for clustered processes L(r) > r and regular processes L(r) < r. This normalization makes visual inspection of an estimated function with respect to a theoretical function easier. The L-function is estimated through the estimator ofK-function. Examples of point patterns and their K-functions andL-functions are presented in Figure 4.1.

A measure of deviation from CSR can be formulated via theL-function. First, the estimated bL(r) is calculated for distances r up to some maximum distance R.

Then, the distance at which the absolute difference between the estimate and the theoretical function is the largest is located:

r^∗=arg max

r∈[0,R]|r−^bL(r)|^. ^(4.9)

After this, the signed difference at this distance is used as the deviation measure:

r^∗−^bL(r^∗). (4.10)

This measure is close to zero if the pattern exhibits CSR. Positive values indicate stronger signs of regularity than clustering, and negative values indicate stronger signs of clustering than regularity. This measure was used in III to measure de-viations of point patterns – and collections of patterns via means of estimated L-functions – from CSR.

It should be noted that there exist nonfunctional summary statistics for charac-terizing spatial point patterns, such as the Clark-Evans index [40]. This index is the ratio between the mean nearest-neighbour distance in the observed pattern and the expected nearest-neighbour distance of a Poisson process with the same intensity as the observed pattern. The Clark-Evans index was used inIand IIto measure the deviation of patterns from CSR.

4.1.2 Marked point processes

An important extension of point processes aremarked point processeswith realisations {(xi,mi)}_i=1^N ^{. Here,}xiare locations, just like in point processes, but now there is an

Cluster CSR Regular

point patternKfunctionLfunction

function theoretical estimated

Figure 4.1: Examples of point patterns exhibiting three different characteristics:

clustering, complete spatial randomness and regularity, and theK- andL-functions calculated from the point patterns. The number of points and the size of the window is the same in all cases.

additional markmi – possibly generated by a random variable – connected to each of the point locations. Marked point processes also have concepts of stationarity and isotropy, the distributional invariance covering in this case also the marks. The mark can be any attribute of interest, for example diameter, height, volume, or biomass of a tree. There can be several different marks connected to one point; for example, a point could be marked simultaneously with all of the aforementioned examples of marks. The mark can also be a geometrical object: a commonly considered ge-ometrical mark in the literature is the disc B(x,r). Marked point processes with geometrical marks are commonly called germ-grain modelsin the literature. Germ-grain models are usually used as models of random sets by taking a union over the geometrical marks. For example, ifx_iare the locations of points as above and every point has as a mark a disc with radiusr_i, then

[N i=1

B(xi,ri) (4.11)

is the random set generated by that process. One of the most studied germ-grain models is theBoolean model, where the underlying point process is the Poisson pro-cess and marks are independent identically distributed random compact sets inde-pendent of the Poisson process, for example discs with random radii. The Boolean model has been used e.g. to model the spatial pattern of heather [41]; several other applications have been listed in [12].

Germ-grain models can be characterized by several statistics emerging from their connection to sets. In the planar case, standard measures of sets are for example the surface area covered by the set, the boundary length of the set, and the number of connected components of the set. The germ-grain model produces a random set;

hence, these measures are also random. Due to this, the expected values of the set measures are a natural way to characterize the germ-grain models. However, very often in practice only one realisation of the model is observed in an observation win-dowW⊂S. In some special cases, such as the Boolean model, formulas connecting expected values of the set measures to model parameters are known. Generally, however, these connections are not known. The two aforementioned points can make the fitting of germ-grain models a challenging task. In this thesis the fitting of germ-grain models is not of interest. The germ-grain models are simply a tool upon which the presented estimators are built. Nevertheless, it is good to keep in mind that deriving information relating to an underlying process from a single realisation has its difficulties. This relates to the concept ofergodicity: if a process is ergodic, a single, large enough sample can be analysed to get statistically meaningful results.

A sufficient condition for ergodicity is mixing, which in a sense means that distant parts of a (marked) point process are independent.

In document Horvitz-Thompson-like estimators based on stochastic geometry for forest remote sensing (sivua 23-29)