Statistical models - Theoretical foundations

Research methods

3.1 Theoretical foundations

3.1.4 Statistical models

wij =n (3.3)

A spatial weights matrix can be speci󰎓ed according to various types, which estab-lish the neighbouring structure using diverse methods. For instance, a contiguity mat-rix de󰎓nes two spatial units as neighbours if they share a common border of non-zero length, while a distance-based matrix de󰎓nes two spatial units as neighbours if speci󰎓c conditions are satis󰎓ed given a certain distance between points. Furthermore, di󰎎erent criteria specify the characteristics of the weights matrix of the chosen type; for example, for a contiguity matrix, the queen criterion considers a common edge or vertex, while the rook criterion only accounts for a common edge; instead, for a distance-based mat-rix, the k-nearest neighbour criterion assigns the same number of closest neighbours to all spatial units, while the inverse distance criterion is based upon a step function that provides neighbours with decreasing weights as distance increases towards a cut-o󰎎point, from which units are not considered to be neighbours anymore. Nonetheless, as Elhorst (2010) correctly underlined, the spatial weights matrixW cannot be estimated and needs to be speci󰎓ed in advance [12, 17], hence its speci󰎓cation should be based upon judgements considering the nature of the observations to be studied.

3.1.4 Statistical models

The methodological approach to spatial analysis involves the examination of data and testing of various hypotheses through the employment of di󰎎erent statistical models, whose results are evaluated with a process of model selection that suggests which model better󰎓ts the data. The features of the various non-spatial and spatial models taken into account for this research are outlined here.

Multiple linear regression model (MLR)

Y =αι_n+βX +ϵ (3.4)

The multiple linear regression model de󰎓nes the dependent variable as a linear rela-tionship of explanatory variables and an error term. In the equation,Y is an·1 vector of the dependent variable,ιnis an·1 vector of ones related to the constant parameterα,X is an·kvector of the independent variables,β is ak·1 vector of their parameters andϵ is an·1 vector of the error term. The relationship of the dependent variable with each explanatory variable is often estimated with the ordinary least squares method and the validity of the estimations depend on the following fundamental assumptions:

1. Linearity – The dependent variable can be calculated as a linear function of a speci󰎓c set of explanatory variables plus an error term, as its relationship with each explanatory variable is linear in parameters and the error term enters additively;

2. Independence – The observations are independent and identically distributed:

{xi,󱗌i}^N_i=1 i.i.d.(independent and identicall󱗌distributed); 3. Exogeneity:

(a) The error term is normally distributed conditionally upon the explanatory variables:ϵ_i|x_i ∼N(0,σ_i²);

(b) The error term is independent from the explanatory variables:ϵi ⊥xi; (c) The mean of the error term is independent from the explanatory variables:

E(ϵ_i|x_i)=0;

(d) The error term and explanatory variables are uncorrelated:Co󰸮(ϵi,xi)= 0;

4. Homoscedasticity– The error term has the same variance at each set of values of the explanatory variables:V ar(ϵ_i|x_i)=σ²;

5. Multicollinearity – No explanatory variable is an exact linear combination of the others.

The OLS estimators ˆβj, forj =1, ...,k, are the best linear unbiased estimators (BLUE) for the true parametersβjin the multiple linear regression model when these conditions are satis󰎓ed, otherwise the validity of the estimations can be questioned.

Spatial cross-regressive model (SLX)

Y =αιn+βX +θW X +ϵ (3.5)

The spatial cross-regressive model includes spatial e󰎎ects of the explanatory vari-ables, de󰎓ned as the spatial average of neighbouring characteristics [25]. The equation includes the termW X, an·k vector of spatially lagged predictors, and the related

coef-󰎓cientθ. Whenθ = 0, spatial e󰎎ects of the explanatory variables are absent and the model can be reduced to a linear regression model.

Spatial autoregressive model (SAR)

Y = ρW Y +αιn+βX +ϵ (3.6)

The spatial autoregressive model involves spatial e󰎎ects of the dependent variable, hence it adds a spatial autoregressive structure to the linear regression model [25]. The equation includes the termW Y, an·1 vector of the spatially lagged dependent variable, and the related coe󰎏cientρ. Whenρ = 0, spatial e󰎎ects of the dependent variable are absent and the model can be reduced to a linear regression model.

Spatial error model (SEM)

Y =αιn+βX +ϵ, ϵ =λWϵ+µ

(3.7)

The spatial error model involves spatial e󰎎ects of the error term, referred to as dis-turbances of the model [25]. The equation includes the termWϵ, an · 1 vector of the spatially lagged error term, and the related coe󰎏cientλ. Whenλ = 0, spatial e󰎎ects of the error term are absent and the model can be reduced to a linear regression model.

Spatial Durbin model (SDM)

Y =ρW Y +αι_n+βX +θW X +ϵ (3.8)

The spatial Durbin model involves spatial e󰎎ects of the dependent variable and the independent variables. The equation includes the termsW Y andW X, with the related coe󰎏cientsρandθ. Whenρ =0, spatial e󰎎ects of the dependent variable are absent and the model can be reduced to a SLX model. Whenθ = 0 for all predictors, spatial e󰎎ects of the explanatory variables are absent and the model can be reduced to a SAR model.

For this case, ifθ = −ρβ, thenλ= ρand the model can also be reduced to a SEM.

Spatial Durbin error model (SDEM)

Y =αι_n+βX +θW X +ϵ, ϵ =λWϵ+µ

(3.9)

The spatial Durbin error model involves spatial e󰎎ects of the independent variables and the error term. The equation includes the termsW X andWϵ, with the related coe󰎏-cientsθandλ. Whenθ = 0 for each predictor, spatial e󰎎ects of the independent variables are absent and the model can be reduced to a SEM. Whenλ = 0, spatial e󰎎ects of the error term are absent and the model can be reduced to a SLX model.

Spatial autoregressive model with autoregressive disturbances (SARAR) Y = ρW Y +αι_n+βX +ϵ,

ϵ =λWϵ+µ

(3.10)

The spatial autoregressive model with autoregressive disturbances, originally intro-duced by Kelejian and Prucha (1998) [23], involves spatial e󰎎ects of the dependent vari-able and the error term. The equation includes the termsW Y andWϵ, with the related coe󰎏cients ρ andλ. Whenρ = 0, spatial e󰎎ects of the dependent variable are absent and the model can be reduced to a SEM. Whenλ= 0, spatial e󰎎ects of the error term are absent and the model can be reduced to a SAR model.

Manski model

Y =ρW Y +αιn+βX +θW X +ϵ, ϵ =λWϵ+µ

(3.11)

The Manski model, introduced upon the work of Manski (1993), accounts for every possible spatial e󰎎ect: endogenous interactions, when individual decisions are a󰎎ected by those of the neighbours; exogenous interactions, when individual decisions are

in-󰎐uenced by observable features of the neighbours; correlated e󰎎ects of unobservable features [28]. The equation includes the termsW Y,W X andWϵ, with the related coe󰎏-cientsρ,θ andλ. Various researchers suggest to begin from a simpler model [12], whose choice can occur through certain methods of model selection, as this model is complete and the separate coe󰎏cientsρ,θ andλcannot be really estimated at the same time.

In document Evaluating the incidence of regional patient migration in Italy through the formulation of a theoretical model and the conduction of a spatial econometric analysis (sivua 31-35)