Research methods
3.1 Theoretical foundations
3.1.4 Statistical models
j
wij =n (3.3)
A spatial weights matrix can be specied according to various types, which estab-lish the neighbouring structure using diverse methods. For instance, a contiguity mat-rix denes two spatial units as neighbours if they share a common border of non-zero length, while a distance-based matrix denes two spatial units as neighbours if specic conditions are satised given a certain distance between points. Furthermore, dierent criteria specify the characteristics of the weights matrix of the chosen type; for example, for a contiguity matrix, the queen criterion considers a common edge or vertex, while the rook criterion only accounts for a common edge; instead, for a distance-based mat-rix, the k-nearest neighbour criterion assigns the same number of closest neighbours to all spatial units, while the inverse distance criterion is based upon a step function that provides neighbours with decreasing weights as distance increases towards a cut-opoint, from which units are not considered to be neighbours anymore. Nonetheless, as Elhorst (2010) correctly underlined, the spatial weights matrixW cannot be estimated and needs to be specied in advance [12, 17], hence its specication should be based upon judgements considering the nature of the observations to be studied.
3.1.4 Statistical models
The methodological approach to spatial analysis involves the examination of data and testing of various hypotheses through the employment of dierent statistical models, whose results are evaluated with a process of model selection that suggests which model betterts the data. The features of the various non-spatial and spatial models taken into account for this research are outlined here.
Multiple linear regression model (MLR)
Y =αιn+βX +ϵ (3.4)
The multiple linear regression model denes the dependent variable as a linear rela-tionship of explanatory variables and an error term. In the equation,Y is an·1 vector of the dependent variable,ιnis an·1 vector of ones related to the constant parameterα,X is an·kvector of the independent variables,β is ak·1 vector of their parameters andϵ is an·1 vector of the error term. The relationship of the dependent variable with each explanatory variable is often estimated with the ordinary least squares method and the validity of the estimations depend on the following fundamental assumptions:
1. Linearity – The dependent variable can be calculated as a linear function of a specic set of explanatory variables plus an error term, as its relationship with each explanatory variable is linear in parameters and the error term enters additively;
2. Independence – The observations are independent and identically distributed:
{xi,i}Ni=1 i.i.d.(independent and identicalldistributed); 3. Exogeneity:
(a) The error term is normally distributed conditionally upon the explanatory variables:ϵi|xi ∼N(0,σi2);
(b) The error term is independent from the explanatory variables:ϵi ⊥xi; (c) The mean of the error term is independent from the explanatory variables:
E(ϵi|xi)=0;
(d) The error term and explanatory variables are uncorrelated:Co(ϵi,xi)= 0;
4. Homoscedasticity– The error term has the same variance at each set of values of the explanatory variables:V ar(ϵi|xi)=σ2;
5. Multicollinearity – No explanatory variable is an exact linear combination of the others.
The OLS estimators ˆβj, forj =1, ...,k, are the best linear unbiased estimators (BLUE) for the true parametersβjin the multiple linear regression model when these conditions are satised, otherwise the validity of the estimations can be questioned.
Spatial cross-regressive model (SLX)
Y =αιn+βX +θW X +ϵ (3.5)
The spatial cross-regressive model includes spatial eects of the explanatory vari-ables, dened as the spatial average of neighbouring characteristics [25]. The equation includes the termW X, an·k vector of spatially lagged predictors, and the related
coef-cientθ. Whenθ = 0, spatial eects of the explanatory variables are absent and the model can be reduced to a linear regression model.
Spatial autoregressive model (SAR)
Y = ρW Y +αιn+βX +ϵ (3.6)
The spatial autoregressive model involves spatial eects of the dependent variable, hence it adds a spatial autoregressive structure to the linear regression model [25]. The equation includes the termW Y, an·1 vector of the spatially lagged dependent variable, and the related coecientρ. Whenρ = 0, spatial eects of the dependent variable are absent and the model can be reduced to a linear regression model.
Spatial error model (SEM)
Y =αιn+βX +ϵ, ϵ =λWϵ+µ
(3.7)
The spatial error model involves spatial eects of the error term, referred to as dis-turbances of the model [25]. The equation includes the termWϵ, an · 1 vector of the spatially lagged error term, and the related coecientλ. Whenλ = 0, spatial eects of the error term are absent and the model can be reduced to a linear regression model.
Spatial Durbin model (SDM)
Y =ρW Y +αιn+βX +θW X +ϵ (3.8)
The spatial Durbin model involves spatial eects of the dependent variable and the independent variables. The equation includes the termsW Y andW X, with the related coecientsρandθ. Whenρ =0, spatial eects of the dependent variable are absent and the model can be reduced to a SLX model. Whenθ = 0 for all predictors, spatial eects of the explanatory variables are absent and the model can be reduced to a SAR model.
For this case, ifθ = −ρβ, thenλ= ρand the model can also be reduced to a SEM.
Spatial Durbin error model (SDEM)
Y =αιn+βX +θW X +ϵ, ϵ =λWϵ+µ
(3.9)
The spatial Durbin error model involves spatial eects of the independent variables and the error term. The equation includes the termsW X andWϵ, with the related coe-cientsθandλ. Whenθ = 0 for each predictor, spatial eects of the independent variables are absent and the model can be reduced to a SEM. Whenλ = 0, spatial eects of the error term are absent and the model can be reduced to a SLX model.
Spatial autoregressive model with autoregressive disturbances (SARAR) Y = ρW Y +αιn+βX +ϵ,
ϵ =λWϵ+µ
(3.10)
The spatial autoregressive model with autoregressive disturbances, originally intro-duced by Kelejian and Prucha (1998) [23], involves spatial eects of the dependent vari-able and the error term. The equation includes the termsW Y andWϵ, with the related coecients ρ andλ. Whenρ = 0, spatial eects of the dependent variable are absent and the model can be reduced to a SEM. Whenλ= 0, spatial eects of the error term are absent and the model can be reduced to a SAR model.
Manski model
Y =ρW Y +αιn+βX +θW X +ϵ, ϵ =λWϵ+µ
(3.11)
The Manski model, introduced upon the work of Manski (1993), accounts for every possible spatial eect: endogenous interactions, when individual decisions are aected by those of the neighbours; exogenous interactions, when individual decisions are
in-uenced by observable features of the neighbours; correlated eects of unobservable features [28]. The equation includes the termsW Y,W X andWϵ, with the related coe-cientsρ,θ andλ. Various researchers suggest to begin from a simpler model [12], whose choice can occur through certain methods of model selection, as this model is complete and the separate coecientsρ,θ andλcannot be really estimated at the same time.