• Ei tuloksia

Optical satellite imagery based canopy cover estimation in tropical dryland forest of Senegal

N/A
N/A
Info
Lataa
Protected

Academic year: 2022

Jaa "Optical satellite imagery based canopy cover estimation in tropical dryland forest of Senegal"

Copied!
46
0
0

Kokoteksti

(1)

Luonnontieteiden ja metsätieteiden tiedekunta Faculty of Science and Forestry

OPTICAL SATELLITE IMAGERY BASED CANOPY COVER ESTIMATION IN TROPICAL DRYLAND FOREST OF SENEGAL

Valeriya Serbina

Supervisors: Dr. Timo Tokola (School of Forest Sciences, UEF)

Jarno Jämäläinen (Head of Unit, REDD+ and Sustainable Forestry, Oy Arbonaut Ltd.)

MASTER’S THESIS CROSS-BORDER UNIVERSITY

JOENSUU 2018

(2)

Serbina, Valeriya 2018. Satellite imagery based canopy cover estimation in tropical dryland forest of Senegal. University of Eastern Finland, Faculty of Science and Forestry, School of Forest Sciences, Master’s thesis in forest science specialization in Cross-Border University.

European Forestry, 43 p.

(3)

UNIVERSITY OF EASTERN FINLAND, Faculty of Science and Forestry, Joensuu School of Forest Sciences

Opiskelija, Valeriya Serbina: Optical satellite imagery based canopy cover estimation in tropical dryland forest of Senegal

Master’s Thesis, 43 p., 3 appendixes (3 p.)

Supervisors of the Master’s Thesis: Dr. Timo Tokola and Jarno Hämäläinen November 2018

Abstract

The Thesis focuses on finding suitable methods for measuring canopy cover in the Niokolo- Koba National Park, an area in Senegal part of a REDD+ piloting project. The methods utilised in the study were the Sparse Bayesian and the Zero-and-One Inflated Beta (ZOINBR) regression models. Set of 10 vegetation indexes were used as test indicators to identify the most efficient method of forest cover measurement using remote sensing techniques. Visual assessment was necessary for producing ground data for modeling. The resulting statistical analysis showed the applicability of remote sensing methods using satellite imagery from different dates during the leaf-on season in tropical arid forest, results which outlined that for producing reliable output, several images are required. However, the most reliable vegetation indexes are different for each individual case and season. To conclude, both methods demonstrated reliable results when indexes are derived from two images. During the statistical analysis of results, confidence intervals for different forest classes were found.

Keywords: Remote sensing, canopy cover estimation, forest degradation, REDD+, Senegal forestry, vegetation indexes.

(4)

Foreword

This thesis was done at the School of Forest Sciences, University of Eastern Finland during 2017 – 2018.

I want to extend my gratitude to my supervisors Dr. Timo Tokola and Jarno Hämäläinen as well as to Alain Minguet, Katja Gunia, and Petri Latva-Käyrä for their help, and my proofreader Miles Louis Drury.

(5)

Table of Contents

1. Introduction ... 6

2. Materials and methods ... 10

2.1. Study area ... 10

2.2. Reference data ... 12

2.3. Modelling data ... 15

2.4 Statistical approaches and methods ... 17

3. Results ... 22

3.1 Zero-and-One Inflated Beta Regression (ZOINBR) models ... 22

3.2 Sparse Bayesian method ... 24

3.3 Classification accuracy ... 29

3. Discussion ... 36

6. References ... 39

Appendices ... 44

Appendix 1 ... 44

Appendix 2 ... 44

Appendix 3 ... 46

(6)

1. Introduction

With the increase of technological capacities, many forest inventory techniques are supported by remote sensing data. Despite the variety of methods used for land use land cover classification, the lack of specific methods for developing countries is still a concern. Canopy cover mapping methods are an important element in the monitoring activities of Reducing Emissions from Deforestation and forest Degradation (REDD+) programmes for the United Nations. As a worldwide initiative, REDD+ focuses on environmental issues while also considering social factors, hence, by its nature, many REDD+ programmes focus on developing countries. The enhancement of a country’s existing national forest monitoring systems (NFMS) to obtain more accurate information of its forest resources is a common theme amongst REDD+

supported projects (GOFC-GOLD 2016).

Due to working in the context of sustainable development, the programme has wide scale political support within such developing countries, and involves work with local communities to aid public level understanding of individual actions in the context of a whole community.

These system has two functions, such as monitoring and MRV ((monitoring, reporting and verification) (GOFC-GOLD 2016). According to the Global Forest Observation Initiative’s report of year 2016 the monitoring function underlines that many domestic instruments for forest resource assessment already exist, and are able to provide the information necessary for REDD+, though, specific improvements are often necessary, in addition adjustments to the aims of the program and countries specific features. The other function - MRV - is relaxed to the implementation of monitoring tools and the data acquisition for reporting (GOFC-GOLD 2016).

Remote sensing data for forest inventory allow expanding of spatial coverage of large area biomass estimates. In the same time, remotely sensed data are necessary to fill spatial gaps when an inventory project is conducted. This type of hybrid approach is particularly required for natural forests where basic inventory data for biomass estimation are lacking. Minimum mapping units depend on the data quality and resolution. Nowadays, remotely sensed data have become an key data source for biomass estimation. Generally, two ways to estimate biomass exist – through a direct relationship between spectral response (or backscatter for SAR) and biomass using statistical methods; or via indirect relationships. Attributes are derived from leaf area index (LAI), structure (crown closure and height) or shadow fraction.

(GOFC-GOLD 2016).

A variety of remotely sensed data sources of coarse spatial resolution are important and those as utilised for biomass mapping (data, such as SPOT VEGETATION, AVHRR, and MODIS).

(7)

In order to establish a good linkage of field measurements and coarse resolution remote sensing data (e.g., MODIS, AVHRR, IRS - WiFS), several studies have introduced multiscale imagery with moderate spatial resolution imagery (e.g., Landsat, ASTER) in their methods. The most frequently utilised imagery in these studies is data of Landsat TM and ETM+. Deriving stand attributes from LIDAR data and inserting them into allometric biomass equations became an additional way in many studies. Other studies found the use of multispectral, LIDAR and RADAR data integrating spectral response, image texture and backscatter together as supplementary variables in multivariate regression models (GOFC-GOLD 2016).

Geographical information system (GIS) based methods with ancillary data exclusively ( i.e.

climate normals, precipitation, topography, and vegetation zones) are applicable in biomass estimation (GOFC-GOLD 2016). Geostatistical approaches, for instance, kriging are utilised in different research works (Simard et al., 1992). Frequently, GIS models are used to combine multiple data sources for biomass estimation (e.g., forest inventory and remotely sensed data).

For instance, mapping of Amazonian forest AGB was conducted using MODIS, JERS 1 SAR, QuickSCAT, SRTM, and climate and vegetation data (GOFC-GOLD 2016).

The major question in the use of remotely sensed data for forest biomass mapping is consistency in results from different sources and methods with the respect to relationships respect to both time and space. Additional work is needed to improve uncertainty in biomass estimation for ground-based methodologies – it is important to learn about sources of uncertainty and introduce remote sensing methods, which are reliable and are equally suitable across time and space (GOFC-GOLD 2016). Table 1 represents data availability, which can be used for biomass estimation.

(8)

Table 1. Current availability of fine scale satellite data sources (GOFC-GOLD 2016)

Satellite Observation System program

Technical observation challenges solved

Access to quality information worldwide

Continuous observation program for global coverage

Pre-processed global image data base generated &

accessible

Imagery data available in agencies for land change analysis

Capability to sustainably use map products for developing countries Landsat TM /

ETM

Aster On demand

SPOT HRV (1 – 5)

Commercially

Optical data

CBERS 1 – 3 Regionally

IRS Indian program

Regionally

DMS program Probably Commercially

ALOS /

PALSAR + JERS

Regionally

SAR ENVISAT ASAR / ERS 1+2

Regionally

TERRARSAR- X

Commercially

IKONOS, GEOEye

Probably Commercially

ICESAT / GLAS

(LiDAR)

Note: dark blue means it is common or fully existing, light blue means it is partially existing and several examples were found, white – rare or no applications or examples.

The implementation and development of different earth observation techniques is highly important for climate change mitigation within the REDD+ program. Earth observation from space first started in 1972 (Jones and Vaughan, 2011), when the first data about our planet was acquired. The importance of the developments in aerial color and color infrared in the 1950s is also outlined by this text, as it enabled the first opportunities to assess vegetation cover. In the

(9)

same decade, Robert Colwell first referenced the introduction of photogrammetry and photointerpretation (Colwell, 1956). In 1972 NASA implemented a new technology for this period (later renamed as Landsat), and those can be seem as the starting point for modern land use land cover classification (Murayama et al., 2015). Some other early publications were published in 70s (for instance, 1976, Anderson et al. (1976)). Tokola et al. (1999) provide an example about how to monitor deforestation and forest degradation over time using old Landsat data. Some historical events had a big impact on methodological development. The year 2005 is famous for the first launch of Google Earth, and in 2008 NASA announced that their data would become freely available (Jones and Vaughan 2011), opening previously unavailable possibilities for scientists across over the world.

During recent years the Food and Agricultural Organisation (FAO) of the United Nations has been active in developing operational MRV tools. The Open Foris approach for environmental monitoring based on canopy cover record and classification using Google Earth was introduced in 2009 (openforis.org).

Vegetation indexes is a tool to map land use classes and assess vegetation cover. One of the official sources for land use land cover classification is the guidelines rom the Intergovernmental Panel on Climate Change (IPCC) (IPCC Guidelines for National Greenhouse Gas Inventories, 2006). This includes the following classes for greenhouse gas inventories: (i) Forest land; (ii) Cropland; (iii) Grassland; (iv) Wetlands; (v) Settlements; (vi) Other land (Penman et al., 2012). Having unique ecosystems, countries need local approaches to assess their forest resources. Moreover, each area has different density of forest on its territory and this has to be taken into account when selecting a method. Due to the impact of forest degradation and deforestation having a significant effect, countries have a need for accurate information about their forests.

The main objective of this master’s thesis is to study different ways to measure forest degradation using optical satellite imagery in tropical arid forests of Western Africa. The target area is located in Senegal and partly includes the Niokolo-Koba National Park; it is a part of the REDD+ Piloting project “Monitoring and Non-Wood Forest Product Value Chains to Mitigate Green House Gas Emissions in the Rural Communities of Bandafassi, Senegal”. The projects was co-funded by Nordic Climate Facility (NCF), the facility was financed by the Nordic Development Fund (NDF) and NCF.

The research question is as follows:

What are the methods of canopy cover estimation using optical satellite imagery and how accurate are they?

Which can be split into two specific questions:

(10)

1. What are suitable vegetation indexes for canopy cover estimation?

2. How well can forest classes be predicted using modern remote sensing methods of canopy cover estimation and what are applicable thresholds?

The target area is a part of project within the REDD+ programme of the United Nations.

Understanding the importance of measuring forest cover for further improvement and adjustment of methods for specific conditions of African countries is highly important within climate change actions. This gives a strong motivation to focus on the topic of land use land cover classification for the case of Senegal. This country has not only unique natural characteristics and rare inhabitants, but cultural features with people living in this land and using this forest.

2. Materials and methods

2.1. Study area

The Republic of Senegal is located in Western Africa, bordered with Mauritania in the north and north-east, with Mali in the east, and with Guinee and Guinee Bissau in the south. Gambia is located on the southern side of Senegal along the Gambia River. The Atlantic Ocean is on its west side. Senegal is a part of Sahel countries; six countries which are located in the arid and semi-arid climatic zones. These territories are considered to be one of the most fragile as they have a dry climate and irregular precipitation patterns (despite of humid season, which can last longer or shorter than in average) (Heinrigs and Perret, 2006)).

There is a long dry season period (from November until May), and the humid season lasts from June to October. Precipitations are very important in the south of the country and averages 1400 mm per year. The northern part has an annual average of 381 mm. The average temperature is 23⁰c (winter) - 28.3⁰c (summer), though temperatures of 50⁰c can be reached. The coast line is about 700 kilometres (OSS 2015).

The Niokola-Koba National Park is a World Heritage Site located in Senegal and, being designated in 1954, is one of the oldest parks in Africa (OSS 2015). The study areaincludes the Bandafassi village and partly covers the Niokolo-Koba National Park a study area which spreads across 3911 km2 (see Figure 2.1.1).

(11)

Image 2.1.1 Study area boundary South-Eastern Senegal

The closest city to the park is Kerdougou. In the Park, the large rivers of the Gambia, Sereko, Niokolo and Koulountou are important for vegetation diversity (vegetation differs from dry forests and savannah to dense forests). According to the Atlas of land use produced within the REPSAHEL project in 2015 (OSS 2015), the following vegetation types and forest classes can be found in the study area, with Sparse forest the more frequent – dense forest, gallery forest, sparse forest, mangrove, plantation forest, wooded savannah, shrubby savanna, shrubland, shrub and wooded steppe (OSS 2015).

Most of species found in Senegal inhabit the country’s National Parks (4330 biotope species are in total). Mammals are represented by 192 species such as elephants, lions, hippopotamus, giant eland, and African buffalo. Among Senegalese flora, mostly herbaceous species are found (around 50% of the territory). A reduction of forest cover is defined as one of main ecological problems of Senegal in addition to issues such as lack of water resources. Tree species such as baobab (Adansonia digitate),néré (Parkia biglobosa), Senegal mahogany (Khaya senegalensis), Borassus (Palmyra palm) are typical canopy constituents of forest ecosystems (OSS 2015 ).

Arid zones cover 18.8% of the area (compared to an African average of 46.1%) and show a high variety of soils, land types, flora and fauna. However, its forest is mostly sparse and tree species richness is relatively low. Arid forests are considered to be very fragile ecosystems highly dependent on precipitation, with less fertile solid than other areas (Heinrigs and Perret, 2006). Due to this scarcity of water, tree species are adapted to this seasonal rainfall and most lose their leaves during the dry season. Thus, if there is a short humid season, it can negatively affect ecosystems, with desertification and land degradation. Understanding the link between soil degradation and tree species richness is complex, but assessing vegetation cover helps to

(12)

monitor soil degradation and to detect a vulnerability to degradation of an area in arid zones (Murayama et al., 2015).

2.2. Reference data

Since Google Earth contains a large dataset of satellite imagery, it is a useful tool from which to derive necessary information for modelling while working remotely of the target area. The ArcMap environment allows working with *.kmz files and their conversion to shapefiles or .csv format documents. Imagery for the study area is provided for Google Earth by the French Space Agency, the European Commission, European Space Agency's Copernicus Earth, and DigitalGlobe, and has spatial resolution of 50 m. Google Earth has different imagery from different time periods, where canopy cover changes can be observed and estimated. At the first stage, the study area was delineated on Google Earth. It was needed to see how big area is covered by satellite images. The date of each image was added into the shapefile attribute table.

The study area with the boundaries of available imagery was prepared to apply a sample technique. At the end of preparatory steps on Google Earth, a small gap was found, where no satellite imagery was taken.

Image 2.2.1 Imagery cover polygons on Google Earth Pro

The next preparatory phase was to create a network of sample plots within a forest area (forest mask was prepared before during the project) based using random sampling (Marshall, 1996).

(13)

ArcMap was used to make a grid for the imagery cover area with grid labels spaced at 2500 metres, then information from imagery cover polygon to grid label was added.

The selection was aimed at choosing points with the greatest number of images. A selection was applied to the “Year” column in order to select labels with only 2014 – 2015 imagery available. According to Google Earth data, the images taken in this period have mostly covered the Niokola-Koba National Park.

The prepared table contains the year and month when the image was taken, and the filed

“Name” has an exact date for each image. This date is important to ascertain if the image was taken during the dry or humid season. During the next step, sample plots were selected based on the created imagery cover polygon. The polygon covered almost all the area without its small part. At the beginning there were 300 sample plots but 250 were selected to conduct visual assessment. The OpenForis (http://openforis.org/) approach was used to collect visual interpretation attributes (Image 2.2.2). Sample plots were selected and canopy cover was recorded in a table for each plot.

Image 2.2.2 Example of a sample plots N182 for visual assessment using Open Foris approach

(14)

After all plots were examined, several were excluded due to insufficient imagery quality or the presence of clouds. At the end of first preparatory phase, there were 234 sample plots for further assessment. The dataset was divided by two in order to have training data and reference data.

Table 2.2.1 Statistics of reference visually assessed data

Count 117

Minimum 0

Maximum 96

Mean 30,769231

Standard Deviation 28,068105

(15)

2.3. Modelling data

Vegetation indexes were used to predict forest cover. Vegetation indexes allow the observation of vegetation, and its changes within a period of time (Huete et al., 2002). They are widely utilised to obtain biophysical parameters to allow land use change and seasonal vegetation cover modification to be monitored. They represent a mathematical formula, which contains satellite bands. Many of them can be found in different sources, of which 10 were selected for this exercise and they are listed in the following table:

Table 2.3.1 Vegetation indexes and their formulas

Index Formula Reference

NDVI (Normalized

Difference Vegetation Index)

𝑁𝐼𝑅 − 𝑅𝐸𝐷 𝑁𝐼𝑅 + 𝑅𝐸𝐷

Rouse et al., 1974

SAVI (Solid-Adjusted Vegetation Index)

𝑁𝐼𝑅

𝑁𝐼𝑅 + 𝑅𝐸𝐷 + 𝐿(1 + 𝐿) Gilabert et al., 2002 TNDVI (Transformed NDVI)

√𝑁𝐼𝑅 − 𝑅𝐸𝐷 𝑁𝐼𝑅 + 𝑅𝐸𝐷+ 0,5

Ahmad 2012

ARVI (Atmospherically Resistant Vegetation Index)

𝑁𝐼𝑅 − 𝑅𝐵 𝑁𝐼𝑅 + 𝑅𝐵

Wulder and Franklin 2003

MSAVI (Modified SAVI) (𝑁𝐼𝑅 − 𝑅𝐸𝐷)(1 + 𝐿) 𝑁𝐼𝑅 + 𝑅𝐸𝐷 + 𝐿

Clerici et al., 2017 NBR (Normalized Burn

Ratio)

𝑁𝐼𝑅 − 𝑆𝑊𝐼𝑅 𝑁𝐼𝑅 + 𝑆𝑊𝐼𝑅

Key and Benson, 2006

NR (Normalized Red) 𝑅𝐸𝐷

𝑁𝐼𝑅 + 𝑅𝐸𝐷 + 𝐺𝑅𝐸𝐸𝑁

L. Korhonen et al 2017

RAVI (Rain Adjusted Vegetation Index)

𝐺𝑅𝐸𝐸𝑁 𝑅𝐸𝐷

Tucker 1979

SR (Simple Ration) 𝑁𝐼𝑅

𝑅𝐸𝐷

Wulder and Franklin 2003

IPVI (infrared

percentage vegetation index)

𝑁𝐼𝑅 𝑁𝐼𝑅 + 𝑅𝐸𝐷

Crippen 1990

Rouse et al. (1974) define NDVI as one of the first vegetation indexes. The Normalised Difference Vegetation Index (NDVI) is applicable in various research cases, such as LAI (leaf

(16)

area index), biomass from vegetation, and land use land cover classification. Solid-Adjusted Vegetation Index (SAVI) is calculated using near infrared and red bands, L is a soil-adjusted parameter which has to be empirically measured but can be set as 0,5 or 1 depending on vegetation cover (Gilabert et al., 2002). In the case of Senegal, L was selected to be 0,5 because vegetation density in the Niokolo-Koba National Park was considered as intermediate. Ahmad (2011) mentioned that TNDVI (Transformed NDVI) was introduced in 1979 by Tucker, and its formula is square root of NDVI plus 0,5. Due to this change, TNDVI has only positive values.

Atmospherically Resistant Vegetation Index (ARVI) is an index with the rage from -1 to 1. The variable is utilised, for instance, applicable with NDVI due to its ability to reduce atmospheric effect. RB in the formula equals RED – γ (BLUE – RED); γ is a dependent parameter on aerosol type. Kaufman and Tanre (1992) found that if this parameter is set as 1, the index would significantly reduce its response to aerosols. According to Key and Benson (2006), the Normalised Burn Ratio (NBR) index is mostly used for forest fire monitoring. It helps to define burned areas and separate them from unburned landscapes as well as to detect changes caused by fire. The NR (Normalised Red) vegetation index includes two visible bands and near infrared (Korhonen, 2017). RAVI (Rain Adjusted Vegetation Index) is an index based on NDVI but taking into account rainfall. The experiment was aimed at seeing the change of NDVI and developing a coefficient (Wessollek et al., 2015). Michael et al (2003) mentioned in their book that Simple Ratio is a common index. It is used to measure the amount of vegetation and has a good correlation with leaf area index (Stenberg et al., 2004). IPVI (Infrared Percentage Vegetation Index) is considered as linearly equal to NDVI according to Payero et al. (2004); authors assumed IPVI to be suitable to measure grass but inefficient to see vegetation height.

To calculate vegetation indexes, the following software tools were utilised:

1. ArcGIS 10.3.1, R 3.4.0;

2. ArboLiDAR tools 3.6.1 - toolbox developed for ArcGIS environment; and 3. QGIS 2.18.

Raster data are necessary to calculate vegetation indexes. Two images with the resolution of 30 meters (one is dated 11th of May 2014 and the other from 31st of October 2015) were downloaded from EarthExplorer (https://earthexplorer.usgs.gov/). Pre-processing exercises included an atmospheric correction performed using Semi-Automatic Classification plugin in QGIS (Congedo, 2017). Several sample plots which are covered with clouds, had to be excluded in this study. Landsat 8 OLI (Operational Land Imager) was developed by NASA and the US Geological Survey and it makes near infrared and short wave infrared visible

(17)

with panchromatic band (the last one is avaliable only with 15 meters resolution) (landsat.gsfc.nasa.gov).

Indexes were calculated using both software:

1. ArcGIS to derive values of satellite bands for each plot;

2. Calculating indexes in R then has been done using formulas presented on the Table 2.3.1.

Canopy cover values were utilised to train the model together with vegetation indexes.

Table 2.3.2 Modelling visually assessed data statistics

Count 117

Minimum 0

Maximum 100

Mean 43,794872

Standard Deviation 30,101123

2.4 Statistical approaches and methods

Before starting to focus on practical asperc, it is important to outline the main terminology.

According to REDD desk glossary “crown cover is the percentage of the surface of an ecosystem that is under the tree canopy. Also referred to as ‘canopy cover’ or just ‘tree cover’”

(theredddesk.org). It is a highly important indicator of forest when the source of information is only remote sensing data. FAO defines forest as an area with crown cover from 10% to 100%;

and thresholds between forest and non-forest class vary from country to country (GOFC- GOLD, 2014). Land use classes can change from forest to non-forest and the other way around:

the first change is called deforestation, reverce process is afforestation (GOFC-GOLD, 2014).

Forest degradation is “a direct, human-induced, long-term loss (persisting for X years or more) or at least Y% of forest carbon stocks [and forest values] since time T and not qualifying as deforestation” (IPCC report, 2003)).

Canopy cover estimation plays an important role when any REDD+ programme is implemented. Furthering this, remote sensing techniques are especially relied upon when the assessment of inaccessible areas is required. Many different methodologies exist for measuring forest cover and forest attributes. Technologies can utilise two and three dimensional data.

LiDAR (Light Identification Detection and Ranging) is a scanner which produces three

(18)

dimention data for remote sensing purposes. It is widely applied in Nordic countries, is siutable to cinduct national forest inventories for the REDD+ projects. LiDAR is able to give as good results as field inventory (GOFC-GOLD 2014).

Visual interpretation as a method also exists to monitor canopy cover in countries where, for instance, LiDAR is not avaliable, but it has been useful to monitor changes and it does not require sofisticated image processing techniques (GOFC-GOLD 2014).

The source for land use monitoring is satellite imagery stored at U.S. Geological Survey web portal - Landsat 8 OLI, Landsat 7 ETM+, Landsat 5 TM, Landsat 4 TM, and Landsat 1-5 with 30 meters spatioal resolution are downloadable free of charge (GOFC-GOLD 2014).

Different application of this data exist using open source software, such as QGIS (qgis.org).

Satellite imagery has been a suitable source to see historical rate of deforestation or land use change and so on within REDD+ program (GOFC-GOLD 2014).

Seasonality is the most important factor when using satellite imagery for tropical forests as the rainy season is leaf-on period; and trees lose their leaves during dry season (Tokola, 2015).

Canopy cover estimation becomes even more important within the climate change actions, for instance in biomass assessments. Satellite and radar data have recently become available, and many existing techniques require them for different purposes.

Amongst different existing methodologies Random forest (Mascaro et al., 2014) and K-Nearest Neighbor (KNN) (Mehdawi and Baharin B, 2013) are applicable in tropical forests and produced reliable results. Sparse Bayesian method has shown similarly good results as KNN (Juntilla et al, 2008). Zero-and-One Inflated Beta regression was utilised in Laos and presented reliable accuracy (Korhonen et al., 2015).

Avaliability of optical data for many years back with the use of different methodologies allows changes detection – it is important when monitoring forest degradation and deforestation is conducted. Since LiDAR is aimed to measure vertical structiure of forest using it with satellite imagery already brings good result (Nilsson, 1997).

This experiment utilised two methods for canopy cover estimation to be applied for the study area in Senegal:

1. Sparse Bayesian regression (Juntilla et al., 2008);

2. Zero-and-One Inflated Beta (ZOINB) regression (Korhonen et al., 2015).

Juntilla et al. (2008) described a mathematical model to predict forest stand attributes for the Finnish forest inventory. Sparse-Bayesian regression is a non-parametric method, where the set of suitable variables is selected. The selection is conducted using the given input data; it allows reducing the complexity of the model and prevents over-fitting. Moreover, the model substitutes cross validation method for variable selection (Junttila et al., 2008).

(19)

The approach uses training data for teaching the model – the quality of prediction is directly dependent on training data. Usually, field campaign data is a good basis for the model. Different kinds of datasets are suitable to use within the model, but it has to be true information in order to be able to teach the model. After the model has selected suitable variables from the training data, prediction takes place for the rest of area. In the case of Finnish forestry, training data is usually field measurements and then the prediction is done for whole stands, where sample plots are located. Thus, information for the whole forest area is produced (Junttila et al., 2008).

The same scheme is applied for the Niokolo-Koba National Park case but training data will teach the model and then, the model produces canopy cover values for the rest of the area.

According to the approach of the Sparse Bayesian model described by Junttila et al (2008) and taken from the mathematical description of Tippling (2001), the following equation describes the model:

t= ɸw+ε (2.4.1)

In the formula, t is described for every parameter tk,p, k is a forest parameter index (it is an independent variable and it is necessary to model it with a supplementary function). P equals 1…..P in the training data. ɸ is the matrix (1P,1, ɸ1 …….. ɸm)T P * M + 1 (P is amount of plots and M – columns of variables). w =(wk,1…..wk,P)T describes forest parameters and ε is a model error. The probability to have a certain value in a certain plot depends on the sum of squared t - ɸw (Junttila et al., 2008).

The following equation shows normally distributed likelihood:

p(t|w, _σ2)=𝛱𝑝=1𝑝 N(ϕp w, _σ2)= 1

(2𝜋𝜎2)𝑃2

𝑒

||𝒕−ϕ 𝐰||𝟐

𝟐𝜎2 (2.4.2)

Where unknown variance is _σ2; sum of squared length of t - ɸw plays the key role in increasing likelihood.

p(t|α)= 𝛱𝑚=1𝑀 N(0, 𝛼𝑚−1) (2.4.3)

The equation above is utilised to avoid unnecessary variables, which are less than zero. This variance is set using a special hyper-parameter 𝛼 = (𝛼1… . 𝛼𝑀)𝑇 (Junttila et al., 2008).

Korhonen et al. (2015) described the second method called Zero-and-One Inflated Beta regression (ZONIBR). It was used for measuring canopy cover with the data of satellite imagery and airborne LiDAR in tropical forests in the Savannakhet province of Laos. According to the article, the choice was made for the Beta regression (Ferrari and Cribari-Neto, 2004) because linear regression has certain limits. Prediction interval is important, and it sets to be 0 - 1.

Furthermore, as LiDAR data were utilised in this study, a complication arose as Beta regression

(20)

cannot estimate LiDAR plots with 0% or 100%. Finally, the method called Zero-and-One Inflated Beta regression was created, which met all the requirements (Korhonen et al., 2015).

The model described by Korhonen et al. (2015) is based on two types of distributions – Beta and Bernoulli distributions. It allows having any kind of interval prediction for 1 and 0 values.

The parameter GAMLSS (generalized additive models for location, scale, and shape) gives the opportunity to expand possibilities which to those which are not achievable with linear regression (Rigby and Stasinopoulos, 2005). There is an existing package called GAMLSS which was introduced by Rigby and Stasinopoulos in 2005. The model takes independent observations yi (where i=1, 2, …., n), probability function ϝ(yi|𝜃𝑖), it requires 𝜃𝑖 = (𝜃1𝑖, 𝜃2𝑖, 𝜃3𝑖, 𝜃4𝑖) = (𝜇𝑖, 𝜎𝑖, 𝜈𝑖, 𝜏𝑖) – vector, which contains four distribution parameters and where each function can possibly be an explanatory variable. Distribution parameters are defined to be (𝜇𝑖, 𝜎𝑖, 𝜈𝑖, 𝜏𝑖), 𝜇𝑖 and 𝜎𝑖 parameters are related to location and scale, although the model can be used for any population distribution. If yT = (y1, y2, … yn) is n length of vector of response variable, k= 1, 2, 3, 4, gk(.) will be monotonic functions, which connects distribution parameters and explanatory variables.

Following equations represent the model:

gk(𝜃𝑘) = ηk = Xkβk + ∑𝐽_𝑘𝑗=1𝑍𝑗𝑘𝛾𝑗𝑘 (2.4.4) where

𝜇, 𝜎, 𝜈, 𝜏 and η are n length vectors,

βTk = (β1k, β2k, ….. βJ𝑘)is a J𝑘 length parameter vector.

Matrix with the order n × J𝑘 is Xk , 𝑍𝑗𝑘 is a matrix n × qjk. 𝛾𝑗𝑘 ~ 𝑁𝑞𝑗𝑘(0, 𝐺𝑗𝑘−1) is the distribution of 𝛾𝑗𝑘; 𝜆𝑗𝑘 are hyper-parameters. The model contains sub-models - they are utilised depending on purposes. (Rigby and Stasinopoulos 2007). This model works under the GAMLSS package in R environment. The location, scale parameters, and the probability density function of GAMLSS are included in the ZONIBR model.

Validation was the accuracy assessment - a comparison of visually estimated and predicted values using the different techniques.

Following inputs were utilised for accuracy assessment: ID – the plot identification number;

CC_visual – percentage of canopy cover visually estimated; CC_two - percentage of canopy cover predicted using two images; CC_3110 - percentage of canopy cover predicted using 31.10.2015 image; CC_1105 - percentage of canopy cover predicted using 11.05.2014 image;

forestCC_visual – forest class (true –forest /false – non-forest) for canopy cover values visually estimated; for_pred_two – forest class (true –forest /false – non-forest) for canopy cover values

(21)

predicted using two images; for_pred1105 – forest class (true –forest /false – non-forest) for canopy cover values predicted using 11.05.2014 image; for_pred3110– forest class (true –forest /false – non-forest) for canopy cover values predicted using 31.10.2015 image;

fourcl_CC_visual – forest class (1, 2, 3) for canopy cover values visually estimated;

fourclCC_pred_two – forest class (1, 2, 3) for canopy cover values predicted using two images;

fourclass1105 – forest class (1, 2, 3) for canopy cover values predicted using 11.05.2014 image;

fourclass3110 – forest class (1, 2, 3) for canopy cover values predicted using 31.10.2015 image.

Validation data were divided into classes – forest and non-forest. A plot would be under non- forest class with the canopy cover less than 10% according to the FAO definition:

“Land spanning more than 0.5 hectares with trees higher than 5 meters and a canopy cover of more than 10 percent, or trees able to reach these thresholds in situ. It does not include land that is predominantly under agricultural or urban land use.” (FRA 2015).

Other thresholds 30%, 50% and 70% were selected to compare results. The main idea was to observe how these two approaches predict different forest classes and distinguish forest from non-forest. The value below 10% was set as non-forest for three types of assessment.

Tables 3.3.2 - 3.3.4 provide with an example how classes were divided.

Tables 3.3.2 Thresholds of 30 % and 10%

Classes

1 class 0-0,1 CC 2 class 0,1-0,3 CC 3 class 0,3-1 CC

Tables 3.3.3 Thresholds of 50 % and 10%

Classes

1 class 0-0,1 CC 2 class 0,1-0,5 CC 3 class 0,5-1 CC

Tables 3.3.4 Thresholds of 70 % and 10%

Classes

1 class 0-0,1 CC 2 class 0,1-0,7 CC 3 class 0,7-1 CC

(22)

Accuracy assessment was done equally for both approaches and all threshold, and includes:

1. Confusion matrix (Kohavi and Provost, 1998);

2. Multinomial tests of accuracy (Rossiter, 2004);

3. Scatter Plot (Gregorutti et al., 2013);

4. Box plots (Krzywinski and Altman, 2014);

5. RMSE and BIAS were calculated for each class.

Confusion matrix was produced separately for each case as it represents how predicted values deviate from visually estimated (Kohavi and Provost, 1998).

Multinomial tests of accuracy described by Rossiter in 2004 are utilised to describe data from different accuracy perspectives (in includes user’s, producer’s accuracy, confidence interval, kappa and so on.

Scatter plot represent a graph with values to see relationship between them (Friendly and Denis, 2005).

Box plots are a good indicator of the range of variables for each class of predicted and visually estimated values (Krzywinski and Altman, 2014).

Analysis of classes and values was conducted through RMSE (Root-mean-square error (Barnston, 1992)) and BIAS (Walther et al., 2005).

3. Results

3.1 Zero-and-One Inflated Beta Regression (ZOINBR) models

At the first stage of assessment using both selected methods, definition of suitable variables was the priority and, secondly, if those variables are applicable for many cases or only in individual situations.

The Zero-and-One Inflated Beta Regression (ZOINBR) was applied using an R 3.4.1 environment. Input data were as the previous training and validation data. The model used training data as a “model teacher”. Testing different combinations of vegetation indexes and analysing their ability to produce an accurate prediction allowed to find a suitable regression model.

Testing was performed using the whole set of indexes and satellite bands for 234 sample plots.

All of the most inefficient indexes were removed and testing continued in an iterative way. The aim was to keep all satellite bands but take out most of indexes. It will facilitate work when land use land cover classification is performed. Each index needs to be calculated and satellite bands all are included in datasets.

(23)

The final combination of selected variables for ZONIBR method, image dated 11.05.2014 are as follows: NDVI, NBR, Band 2, Band 3, Band 4, Band 5, Band 6, Band 7. Multiple R squared was 0,3230 for the selected combination and Residual Standard Error was 25,15.

Variables for 31st of October 2015 image differed from the previous example: ARVI, MSAVI, NBR, NR, SR, Band 2, Band 4, Band 5, Band 7. It gave Multiple R squared 0,6100 and Residual Standard Error 19,8.

The use of two images brings up a new combination: SR (date 31.10.2015), TNDVI (date 31.10.2015), NDVI (date 31.10.2015), SAVI (date 11.05.2014), MSAVI (date 11.05.2014), NBR (date 11.05.2014), NR (date 11.05.2014), Band 5 (date 31.10.2015), Band 5 (date 11.05.2014), Band 6 (date 11.05.2014), Band 7 (date 11.05.2014). The selection showed Residual Standard Error (18,87), Multiple R squared was 0,6202. In this selection some indexes repeated for different images. It is not possible to see the influence of each variable. Thus, repetitiveness can be a suitable indicator.

Figure 3.1.1 Repetitiveness of variables used for prediction ZONIBR method

Figure 3.1.1 shows how often variables were utilised. Band 5 was selected for prediction the most of times. Band 5 was repeated four times in three predictions, NBR and Band 7 were repeated three times, NDVI, MSAVI, NR, Band 2, Band 4, Band 6, other variables were used only once.

1 1

2 1

2

3 2

1 1

2 1

2

4 2

3

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

sr tndvi ndvi savi msavi nbr nr arvi ravi b2 b3 b4 b5 b6 b7

Quantity (Times)

Variables

(24)

When a good combination was selected for each image and for two of them, the ZONIBR method was applied. The code developed by Lauri Korhonen (Korhonen et al., 2015) was adjusted to be suitable for the dataset of Senegal for each image.

After the model was adjusted with all necessary variables and data inserted, the GAMLSS function was applied to training data with family called BEINF. Family BEINF represents continued distribution with the range from 0 until 1 (Stasinopoulos et al., 2007). Due to this feature of the function, all canopy cover values in training data are set from 0 – 1 rather than 0 – 100. After the model has been defined and put under GAMLSS function, prediction is conducted for reference data. Output data contained a table with Plot ID, Canopy cover visually estimated, and Canopy cover predicted using the ZONIBR method.

3.2 Sparse Bayesian method

In the case of Senegal, there were two data sets as training and reference data. Modelling canopy cover was conducted using ArboLidar software built for the ArcGIS environment. Training data was intended to teach the model, and reference data will have predicted canopy cover and assisted in model validation.

The process contained some preparatory steps: model teaching, variables selection and prediction. A geodatabase containing both data sets was created, with the most suitable indexes and bands automatically selected. Each variable has a value, which refers to the percentage of plots. For instance, Simple Ratio index has a value 1, which means that the index was used for prediction in 100 % of plots and it plays a key role in the prediction. Generally, the Sparse Bayesian method created as many models as plots in the dataset. In the Senegal case there are 234 sample plots in total. The following tables represent vegetation indexes:

Table 3.2.1 Suitable variables and their weights selected for the image from the 11th of May 2014

Variable B2 SAVI TNDVI ARVI MSAVI NBR NR RAVI

Impact in

% 1 0 85 100 98 0 0 0

Variable SR IPVI NDVI B3 B4 B5 B6 B7

Impact in

% 0 18 18 1,7 0 0,85 0 0

TNDVI, ARVI, MSAVI, and the Blue band influenced the most. Although, there are some more indexes: NDVI, IPVI, the Green band and the NIR band. IPVI and NDVI cannot interact when

(25)

compared using linear regression in R due to their multicollinearity. Although, Sparse-Bayesian method selects them, they were not used for many plots.

Table 3.2.2. Suitable variables and their weights selected for the image from the 31st of October 2015

Variable NDVI SAVI TNDVI ARVI

Impact in % 0 0 0 1,7

Variable SR IPVI B2 B3

Impact in % 100 0 0,85 0

Variable MSAVI NBR NR RAVI

Impact in % 1,7 3,4 6 7,7

Variable B4 B5 B6 B7

Impact in % 5,1 98 0 3,4

Simple ration and the near infrared band have the most significant impact. There are indexes and bands, which play an important role according to the method’s results: ARVI, MSAVI, NBR, NR, RAVI, band 2, Band 3, and Band 7; they were used for very small number of plots in the prediction. After both images were checked and vegetation indexes selected, it seemed that combinations of indexes and bands vary for different satellite images. Thus, there was a need to see variables derived from two images.

Table 3.2.3 Suitable variables and their role in the prediction using two images from years 2015 and 2014

NDVI 2015

SAVI 2015

TNDVI 2015

ARVI 2015

MSAVI 2015

NBR 2015

NR 2015

RAVI 2015

0 0 0 0 0 0 0 0

NDVI 2014

SAVI 2014

TNDVI 2014

ARVI 2014

MSAVI 2014

NBR 2014

NR 2014

RAVI 2014

93 93 6 0 92 0,85 1,7 0

SR 2015

IPVI 2015

B2 2015

B3 2015

B4 2015

B5 2015

B6 2015

B7 2015

100 0 0 0 0 100 0 0

SR 2014

IPVI 2014

B2 2014

B3 2014

B4 2014

B5 2014

B6 2014

B7 2014

0 93 0 0 0 1,7 100 96

(26)

As shown in the table, the following indexes and bands were selected from the 11th of May 2014: NDVI, SAVI, MSAVI, IPVI, Band 6, Band 7, TNDVI NBR, NR, and Band 5. For the image from the 31st of October, SR and band 5 was the selected combination. The following table shows the selected vegetation indexes with values meaning how big percentage of plots was predicted using one or another index or band. When summarising all the outputs, it is obvious that inedxes calculated using the 11th of May 2014 image have a more significant impact on the prediction.

Figure 3.2.5 Selected vegetation indexes and their importance for the image from the 11th of May 2014 (naming system was as follows for each variable: name_date (11th in this case)) According to the figure TNDVI, MSAVI, SAVI, IPVI, Band 7, ARVI and Band 2 were utilised to predict most of plots – they all have values from 80 until 100 %. Other variables had no significant impact but were still used in some models for some plots.

Figure 3.2.5 Selected vegetation indexes and their importance for the image from the 31 of October 2015 (naming system was as follows for each variable: name_date (in this case 31st )

0 0.2 0.4 0.6 0.8 1 1.2

ndvi_11 B5_11 nbr_11 B3_11 nr_11 tndvi_11 B5_11 ipvi_11 tndvi_11 msavi_11 ndvi_11 savi_11 ipvi_11 B7_11 msavi_11 arvi_11 B2_11

Impact

Variable

0 0.2 0.4 0.6 0.8 1 1.2

Impact

Variables

(27)

Band 4 and Simple ratio have influenced in 98% and 100% of plots. These two variables repeat themselves in the prediction completed using one image and two images: this represents that the indexes’ selection is not always different.

The next figure 2.4.3 represents indexes selected for prediction based on two images.

Figure 2.4.3 Selected vegetation indexes and their importance for two images (naming system was as follows for each variable: name_date (11th or 31st ))

NDVI, SAVI, MSAVI, SR, band 5 (repeated twice), IPVI, band 6 and band 7 were used for the largest number of plots. Other variables are still important but have small impact.

Consequently, there are indexes and bands, which have to be outlined due to their impact on the prediction using Sparse Bayesian method: Band 5, Simple Ratio, Transformed NDVI, Modified SAVI, Soil-Adjusted Vegetation Index, Infrared Percentage Vegetation Index, Band 6, Atmospherically Resistant Vegetation Index, Normalized Difference Vegetation Index, Band 7 and Band 2.

One more method can be used to distinguish variables which were more frequently utilised than others. All indexes and bands were taken together and their repetitiveness in selections was observed. Figure 3.2.8 represents how many times they were repeated.

0 0.2 0.4 0.6 0.8 1 1.2

Impact

Variables

(28)

Figure 3.2.8 Repetitiveness of variables in selections (Sparse Bayesian method)

According to the figure, band 5 is the most frequently used band in three different selections.

An absence of band 3 means that this variable was not used at all. There are some interesting cases, for instance, Simple Ratio was selected only twice, although, it is always used for 100%

of plots. Concerning frequency, the first variable will be Band 5 (utilised four times), then, MSAVI (three times), after that, NDVI, NBR, ARVI, NR, TNDVI, IPVI, SR, band 2 and band 7 (presence for two selections).

Thus, when the selection is done, the model works on prediction of canopy cover. A configuration model sets rules for prediction. All vegetation indexes and satellite bands were tested within the model.

After the table is ready, an accuracy assessment was conducted to see how results can differ when one or another image used as a basis for prediction.

0 1 1 1 1

2 2 2 2 2 2 2 2 2

3

4

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

b3 ravi savi b4 b6 ndvi nbr arvi nr tndvi ipvi sr b2 b7 msavi b5

Quantity (times)

Variables

(29)

3.3 Classification accuracy

Tables 3.3.5 – 3.3.7 are examples from the accuracy assessment through Confusion Matrix.

There is an example from the Sparse Bayesian regression and one example is from Zero-and- One Inflated Beta regression:

Table 3.3.5 Confusion matrix Sparse Bayesian regression threshold 30% 11.05.2014

Visual assessment classes classes 1 2 3 Predicted

classes

1 5 0 0

2 12 3 9

3 23 23 42

According to the matrix, the third class (canopy cover is from 30 % until 100%) has more matching predicted and visually assessed values (42). The second (10 % – 30 %) and the first (0 % – 10%) classes had the fewest plots which were accurately predicted.

Table 3.3.6 Confusion matrix ZONIBR method threshold 50% 31.10.2015 Visual assessment classes

classes 1 2 3 Predicted

classes

1 1 2 1

2 10 7 13

3 29 17 37

The matrix shows that the image taken the 31st of October 2015 produces better prediction for the class two (10% - 50%), but worse for the class one (0 – 10%) than the previous image. The results for the class three (50% - 100%) are similar to the previous confusion matrix.

(30)

Table 3.3.7 Confusion matrix for Sparse Bayesian regression with the threshold of 70%, prediction done using data from two images

Visual assessment classes

classes 1 2 3

Predicted classes

1 1 2 1

2 10 7 13

3 29 17 37

The results are better than in the previous confusion matrix. In the first class (0-10%) 50% of plots were correctly predicted, the second class (10% - 70%) has reached even better accuracy (55 out of 63 were predicted correctly), while the third class (70% - 100%) shows a different picture – there are only four out of 14 plots which have an accurate prediction.

Multinomial test of accuracy is a suitable tool to see a few different indicators as presented on the table.

Table 3.3.8 Multinomial tests of accuracy for Sparse Bayesian method 50% threshold.

Overall accuracy 0.6838

95% Confidence Interval 0.5952 ... 0.7723

Class

User's accuracy

Producer's accuracy

1 Non-forest

(0-0,1) 0.8696 0.5000

2 50% CC threshold

(0,1-0,5) 0.3939 0.5000

3 (0,5 - 1) 0.7705 0.9216

These results show an overall accuracy of more than 60 percent, which represents that most of classes were accurately predicted. Confidential interval of numbers is in between 0,59 and 0,77, which means that the values were predicted correctly in most cases. Non-forest class has a high user’s accuracy, which is a good indicator; producer’s accuracy shows that 50% of plots were interpreted accurately (as for the class two). The class three has high user’s and producer’s accuracy, and therefore means that most of visually assessed values of the class three correspond values predicted using two images.

(31)

Table 3.3.9 Multinomial tests of accuracy for ZONIBR method 70% threshold.

Overall

accuracy 95% Confidence Interval 0.4359 0.3418 ... 0.53

Class

User's accuracy

Producer's accuracy 1 Non-forest

(0-0,1) 0.6000 0.0750 2 70% CC

threshold(0,1-

0,7) 0.3333 0.3462

3 (0,7- 1) 0.4588 0.7647

This example shows overall accuracy about 40% in a confidential interval of 0,34 – 0,53. The class one shows lowest producer’s accuracy, but class three the highest. Values of class three are the most accurate in this model.

Table 3.3.10 Multinomial tests of accuracy for Sparse Bayesian method 30% threshold.

Overall

accuracy 95% Confidence Interval 0.5726 0.4787 ... 0.6666

Class

User's accuracy

Producer's accuracy 1 Non-forest

(0-0,1) 0.7692 0.2500 2 30% CC

threshold(0,1-

0,3) 0.2973 0.4231

3 (0,3 - 1) 0.6866 0.9020

Overall accuracy in this case is lower than for the Sparse Bayesian model with the threshold of 50%. The confidence interval becomes 0,48 – 0,66. The best user’s and producer’s accuracy here is for the class three.

Scatter Plots were created for predicted and visually estimated values, and this shows all values rather than how individual forest classes were predicted.

(32)

Graph 3.3.1 Scatter Plot for the ZONIBR method, prediction done using variables derived from two images

The test shows that points are spread around the graph, what is an indicator of low correlation.

Graph 3.3.2 Scatter Plot for the Sparse-Bayesian method, prediction done using variables derived from two images

(33)

This graph shows that there is a correlation. However, there are some plots which are far from the regression line, and this indicates model errors.

Box plots is one of ways to make a visual representation of accuracy of utilised methods.

Figure 3.3.1 Box plots for three forest classes using ZONIBR method with 50% threshold (image date is the 11th of May 2014)

There are errors for all three classes according to the figure because the range of values is bigger than the actual box for every class. Mean values for each class are presented as a horizontal line inside each box, which should ideally be in the middle of it. In this case, lines are approximately central, which means that most of classes are predicted correctly. However, some values are classified incorrectly.

Another example from the Sparse Bayesian method shows almost the same picture.

(34)

Figure 3.3.2 Box plots for three forest classes using Sparse Bayesian method with 50%

threshold (image date is the 31st of October 2015)

The class one has more values, which does not belong to it according to the visual assessment class. A similar situation is shown for the next two classes, with mean differs to be almost in the middle of boxes; most of values have correctly identified classes.

RMSE and BIAS are good accuracy indicators. The comparison of these types of errors was done using predicted value and visual assessment classes. It indicates how forest / non-forest classes match; it shows how predicted values correspond visually estimated classes.

Table 3.3.11 BIAS and RMSE for Inflated beta regression with the threshold of 30%, prediction done using 11.05.2014 image data

BIAS (values CC predicted and CC visual)

BIAS predicted values, visual class 1

BIAS predicted values, visual class 2

BIAS predicted values, visual class 3

0.14031 0.4270757 0.2075534 -0.1188852

RSME values CC predicted, CC visual

RSME predicted values, visual class 1

RSME predicted values, visual class 2

RSME predicted values, visual class 3

0.342823 0.4515494 0.2547496 0.2767997

(35)

According to the table, overall BIAS is not very high and it gives relatively low variance. The third class predicted values have negative BIAS, and it slightly differs from zero. BIAS in case of class one is close to 0,5 and RMSE is approximately 0,5.

Overall RMSE shows that there are some errors in the model. Concerning RMSE per class, it is higher for the class one (good indicator), but almost 0,3 for classes two and three as well as 0,34 for the whole dataset, it indicated that the model has some errors.

Table 3.3.12 BIAS and RMSE for Sparse Bayesian regression with the threshold of 70%, prediction done using 31.10.2015 image data

BIAS(values CC predicted and CC visual)

BIAS predicted values, visual class 1

BIAS predicted values, visual class 2

BIAS predicted values, visual class 3

0.04931624 0.14975 0.04650794 -0.225

RSME values CC predicted, CC visual

RSME predicted values, visual class 1

RSME predicted values, visual class 2

RSME predicted values, visual class 3

0.1941209 0.1910301 0.170401 0.2828301

BIAS in these examples show less variance than for Zero-and-One Inflated Beta regression with 30% threshold. The lowest BIAS in this case will be for the class three (the same as in the previous table). Non-forest prediction seems to be more accurate than for 30% threshold.

RMSE is higher for the class three but low for one and two.

Table 3.3.13 BIAS and RMSE for Inflated beta regression with the threshold of 50%, prediction done using two images’ data

BIAS(values CC predicted and CC visual)

BIAS predicted values, visual class 1

BIAS predicted values, visual class 2

BIAS predicted values, visual class 3

0.1397889 0.4404369 0.1208274 -0.2705017

RSME values CC predicted, CC visual

RSME predicted values, visual class 1

RSME predicted values, visual class 2

RSME predicted values, visual class 3

0.3751819 0.4941017 0.2281191 0.3897631

Classes one and three show similar tendency as the first example. The class two shows more variance than for the class two using Sparse Bayesian method with the threshold 70, but similar with Beta regression of 30% threshold. RMSE shows disadvantages of the model, especially, for the class two. Overall accuracy is approximately similar for Zero-and-One Inflated Beta regression results.

(36)

Overall RMSE is slightly similar for ZOINBR regression; Sparse-Bayesian has lower RSME in this case. BIAS for the whole dataset is closer to zero for the Sparse-Bayesian method.

Finally, there are certain things which should be outlined after the accuracy assessment is done for two methods using three thresholds. The best achieved overall accuracy was indicated for the Sparse Bayesian method using two images data for prediction (0,68). The most complicated class to predict was non-forest class. The threshold of 50% was the easiest to distinguish using both methods, overall kappa (McHugh, 2012) for this test was 0,3345, what represents percentage of agreement.

3. Discussion

Defining activity data is considered to be the most difficult part when a greenhouse gas (GHG) inventory is implemented, particularly for the areas where land use class detected to be changed (for instance, deforestation / reforestation areas). Usually, countries use available ground-based information (e.g., national statistics for agriculture, forestry, wetland and urban areas;

vegetation and topographic maps, climate data) with remote sensing data (e.g., aerial photographs, satellite imagery etc.), with the application of GIS-based methods (GOFC- GOLD 2016). A separate monitoring system is an alternative way to extract independent information. Some of good examples are found in Brazil, India and Congo – the Brazilian system generates annual deforestation estimates (Morton et al, 2005) in Amazon, the Indian National bi-annual forest cover assessment (FSI, 2013), and a sampling approach applied in the Congo basin (GOFC-GOLD 2016).

The Brazilian National Space Agency (INPE) produces estimates through annual national monitoring program PRODES (Wheleer et al, 2014). The PRODES is working starting from 1988, with the minimum mapping unit of 6.25 ha. Furthermore, the project is carried out once a year in order to use dry conditions and cloud free time for deriving estimates, it provide with results of foreseen in December. PRODES uses imagery from TM Landsat imagery, DMC satellites , and CCD sensors, with a spatial resolution of 20 to 30 meters (Wheleer et al, 2014). In India, The assessment of XII cycle applied satellite imagery from the Indian satellite IRS P6 (Sensor LISS - III with 23.5 meters resolution). Only imagery with less than 10% clouds were selected for the 313 LISS - III scenes covering India.(GOFC-GOLD 2016).

As one example of a regional research is project implemented in Congo. A systematic sampling approach with Landsat imagery was performed to the entire Congo River basin to estimate deforestation. The survey formed sample plots of 20×20 square kilometres systematically distributed every 0.5° in the whole forest area of Central Africa. Then, 547 sample plots were

Viittaukset

LIITTYVÄT TIEDOSTOT

The accuracy of the forest estimates based on a combination of photogrammetric 3D data and orthoimagery from UAV-borne aerial imaging was at a similar level to those based on

Korhonen (2006) presented several models for canopy cover; of the three alternative model shapes that were tested fairly simple models with basal area and mean DBH as

In this paper, we demonstrated the use of two methods, the LAI-2000 Plant Canopy Analyzer instrument and the Cajanus tube, in Scots pine stands for canopy cover estimation, and

Keywords: Ecological niche modeling, Forest disturbances, Forest health monitoring, Insect pests, Invasive species, Remote

4.2 Mapping forest cover and volume in tropical forests using k-NN (II and III) Covering the Terai region with Landsat TM satellite data utilized in study II required several

In particular, this work examined the training area concept in a two-step approach for AGB estimation using airborne laser scanning (ALS) and RapidEye satellite

In Study III, an empirical model-based segmentation approach was developed to extract forest stands of tropical forests from remote sensing materials and empirical models derived in

I. Estimation of forest canopy cover: a comparison of field measurement techniques. Local models for forest canopy cover with beta regression. A relascope for measuring canopy