Chemometrics
Matti Hotokka Physical chemistry Åbo Akademi University
. The mean can be understood as the value ì which gives least deviations.
Linear regression
Mean
x1-ì
ì = E(x)
x2-ì
x
. The mean can be understood as the value ì which gives least deviations.
Linear regression
Mean
x1-ì
ì = E(x)
x2-ì
x
Linear regression
Some math: mean
Linear regression
Linear regression
x y
ycalc(x) = a + bx
y1 - ycalc
y2 - ycalc
Linear regression
Linear regression
x y
ycalc(x) = a + bx
y1 - ycalc
y2 - ycalc
Linear regression
Formulas
Linear regression
Formulas
Linear regression
Formulas
Linear regression
Formulas
Linear regression
Formulas
. Homogenous variance:The distribution of y is the same for each x.
Linear regression
Assumptions
x y
P(x|y)
x1 x2 x3
ó
. The population behaves linearily: The true regression obeys the rule á+âx
Linear regression
Assumptions
x y
P(x|y)
x1 x2 x3
á+âx (true)
a+bx (estimate)
. The random variables y1, y2, y3, ... are statistically independent.
Linear regression
Assumptions
. The residual variance is
Linear regression
Quality: the basic quantity
. The estimated standard error of the slope is
Linear regression
Goodness of the slope
. Confidence interval (95 %)
Small range of x gives poor b because Gx2 is small.
Linear regression
How to improve a?
A large range gives smaller SE.
Linear regression
How good is ycalc?
ycalc = a + b@x0
ì0 = true estimate = á + âx0
x0
Linear regression
How good is ycalc?
ycalc = a + b@x0
ì0 = true estimate = á + âx0
x0
Linear regression
How good is ycalc?
ì0 = true estimate = á + âx0 ycalc = a + b@x0
x0
y0 = true value in population
Confidence intervals for ì0 and y0.
Linear regression
How good is the regression
ì0 y0
Slope of regression
Correlation
Regression vs correlation
Correlation
Relation
Correlation
Different cases
r = -1.0 r = -0.8
r = 0.0
r = 0.6 r = 1.0
r = 0.0
Correlation
Explained SSQ
Total SSQ
Correlation
Explained SSQ
Total SSQ
Explained SSQ
Correlation
Explained SSQ
Total SSQ
Explained SSQ
Unexplained SSQ
Correlation
Coefficient of determination
Multivariate regression
Why?
y
x
A second regressor explains the spread.
Multivariate regression
Why?
y
x1 x2=3
x2=2 x2=1
Multiple regression considers all regressors.
Multivariate regression
What?
x1
x2 y
Multivariate regression
Indirect effects
x1=Conc
y=Absorbance
x2=pH
a1= 0.059
a2= -0.16 a= -0.032
Total effect = 0.059 + 0.005 = 0.064
Multivariate regression
Indirect effects
x1=Conc
y=Absorbance
x2=pH
a1= 0.059
a2= -0.16 a= -0.032
Total effect = 0.059 + 0.005 = 0.064
Linear regression: y = a x1 + b: a = 0.064 Gives the total effect!
Change notations for simplicity
Multivariate regression
Math
Similarly
A linear system of equations for the unknowns a, b and c can be written in matrix form as
Multivariate regression
Math
The unity is included to show clearly the structure of the equation.
Rename variables
Nonlinear regression
Make it linear
Take logarithm
Methods still exist. Just minimize the spread.
Nonlinear regression
If you cannot make it linear
However, the system of equations is not linear anymore. Therefore an iterative process is needed to solve it.