• Ei tuloksia

Matti HotokkaPhysical chemistryÅbo Akademi University Chemometrics

N/A
N/A
Info
Lataa
Protected

Academic year: 2022

Jaa "Matti HotokkaPhysical chemistryÅbo Akademi University Chemometrics"

Copied!
29
0
0

Kokoteksti

(1)

Chemometrics

Matti Hotokka

Physical chemistry

Åbo Akademi University

(2)

. Consider spectrophotometry as an example

< Beer-Lamberts law: A = cåR

< Experiment

– Make three known references with concentrations c

1

, c

2

, c

3

and measure the absorbances A

1

, A

2

, and A

3

– Place a straight line through the points: A = a + bc – Measure the absorbance A of the unknown sample – Read the concentration from the calibration curve

Linear regression

Experiment

(3)

Linear regression

Calibration curve

c A

Acalc(c) = a + bc

We have good theoretical grounds for saying that the calibration model is linear.

The intercept a is

determined by additional disturbing components in the sample and can be ignodred.

(4)

. One measured value, one regressor

< Equation: y

calc

(x) = b

0

+ b

1

x

< To be solved: b

0

, b

1

< How to:

Linear regression

How to solve

(5)

. Two components a and b

< Measure at two wavelengths, ë

1

, and ë

2

< Beer-Lamberts law but different for each component and wavelength;additive

– At ë

1

A

1

= å

a1

c

a

+ å

b1

c

b

– At ë

2

A

2

= å

a2

c

a

+ å

b2

c

b

– In matrix form

Multiregression

Experiment

(6)

. Obtain the unknown coefficients å

< In this case, two solutions containing only either a or b in known concentrations

< The molar absorption coefficients are calculated from

– A

1

' = å

a1

c

a

at ë

1

– A

2

' = å

a2

c

a

at ë

2

– A

1

" = å

b1

c

b

at ë

1

– A

2

" = å

b2

c

b

at ë

2

Multiregression

Calibrate the model

(7)

. One dimension

< y = b

0

+ b

1

x = b

0

@1 + b

1

x

. Generalization of the linear model

Linear regression again

Standard method

(8)

ANOVA

In linear regression

Definitions

Sum of Squares Matrix operation Calculation D.f.

SST, Total SSM, Mean

SScorr, Corrected for the mean SSfact, Factors

SSR, Residuals SSlof, Lack of fit

SSpe, Pure experimental error

n = observations; p = coefficients b; f = replications

(9)

ANOVA

Example

No. x y

1 0 0.3

2 1 2.2

3 2 3

4 2 4

(10)

ANOVA

Sums of squares

(11)

ANOVA

Quality of the fit

Correlation

Mean sum of squares: MSS = SS divided by D.f.

F-test for goodness of fit

F-test for lack-of-fit

If this F-value exceeds the critical value in the table the fit is significant.

This F-value cannot exceed the critical value if the model is appropriate.

(12)

ANOVA

The example

Goodness of fit

Exceeds the critical value at 5 % risk, 18.51. The fit is statistically significant

Lack of fit

This value is below the critical value at 5 % risk, 161. The model is appropriate because the lack of significant is not significant.

(13)

ANOVA

Confidence intervals

Variance-covariance matrix

For an appropriate fit with a low value of SSlof, MSSR = sR2 can be used instead of Sspe. The diagonal elements of the variance-covariance matrix are the variances of the factors b.

The confidence limits of a factor b (either b0 or b1) are

The prediction at a given point x0 = (1 x0) is

(14)

ANOVA

The example

At 5 % risk level, F(0.05; 1, 2) = 18.51. Thus we obtain

(15)

Multiple linear regression

Ordinary regression

Two-dimensional case

Measurement at three points (x11,x12), (x21,x22) and (x31,x32) are needed

In matrix form this system of equations is written as

The order has been changed to stress similarity to 1-dim regression

(16)

Multiple linear regression

Ordinary regression

If there are several dependent variables y, each with a different equation

= B

Y X

p m

n

n m

p

+Residuals

(17)

Multiple linear regression

Ordinary regression

The equation is solved exactly as the 1-dim equation

However, if there are linear dependencies between the x’s the system becomes singular and cannot be solved.

Prediction a y0 vector (dimension 1xm) at a given point x0 (dimension 1xp) is

(18)

PCR

Principal component regression

In full multicomponent regression it often happens that some of the x’s are interdependent, i.e., not linearly independent.

To avoid this only a few coordinate axes are used. They are chosen to be orthogonal. The selection of orthogonal coordinates resembles the PCA method.

(19)

PCR

PCA revisited

In PCA, the original data matrix X is written as a product of the scores and loadings,

One method of solving the problem is to use the SVD (singular value decomposition) method. In that case X matrix is written as

Here matrix U corresponds to T and V corresponds to L. They are joined by a diagonal matrix W. The diagonal elements are wii = %ëii, where ëii are the eigenvalues of the X matrix. The smallest eigenvalues can be forced to value zero. Then this matrix will remove the small eigenvalues indicating dependencies.

(20)

PCR

The SVD method

Solution of the full linear equation is

Now the matrix X is written in SVD approximation as

Then a pseudo-inverse matrix X+ can be used

The solution will then be given along a desired number of principal axes as

(21)

PLS

Partial Least Squares method

In PCA the matrix X is split into a product of the scores matrix and the loadings matrix,

X

p

n

= T

d

n

P

T

p

+

d

+ E

p

n

(22)

PLS

Partial Least Squares method

In PLS, also the matrix Y is split into a product of the scores matrix and the loadings matrix,

Y

m

n

= U

d

n

Q

T

+

d

+ F

n

m m

(23)

PLS

Partial Least Squares method

The solution can then be written as

Here W is is a dxp matrix of PLS weights. Only a few of the eigenvalues are kept, the rest are set to zero.

(24)

PLS

Algorithm

Initialize: Shift the columns of the matrices.

Initialize: Use the first column of the Y matrix as the first Y score vector.

(25)

PLS

Algorithm

(1) Compute X-weights

(2) Scale the weights

(3) Estimate the scores of the X matrix

(26)

PLS

Algorithm

(4) Compute the Y loadings

(5) Generate a new u vector

Repeate from step (1) until u is stationary.

(27)

PLS

Algorithm

(6) Determine the scalar coefficient b for this variable.

(7) Compute the loadings of the X matrix.

(8) Compute the residuals.

(28)

PLS

Algorithm

Stopping criterion: Calculate the standard error of prediction due to cross- validation.

If SEPCV is greater than the actual number of factors then the optimum number of dimensions has been reached and the final B coefficients can be calculated. Otherwise use the residuals from step (8) as the new X and Y matrices and continue from the initialization step with an additional

dimension.

(29)

Viittaukset

LIITTYVÄT TIEDOSTOT

Yleistettä- vyyttä rajoittaa myös se, että vartiointikohteet kattavat lukuisia toimialoja (kauppa, teollisuus, liikenne, jne.) ja tuotantomuotoja (yksityinen, julkinen),

On the basis of the study, it can be concluded that if the ultimate target for tree-stem log bucking is to maximize the production value (i.e. value recovery) of logs, then nowadys

Aineistomme koostuu kolmen suomalaisen leh- den sinkkuutta käsittelevistä jutuista. Nämä leh- det ovat Helsingin Sanomat, Ilta-Sanomat ja Aamulehti. Valitsimme lehdet niiden

For the total mone- tary value of lifetime production, taking the unit value of wool to be worth 2.5 times the unit value of lamb live weight, the 1/4and 1/2 F-cross ewes surpassed

Runo valottaa ”THE VALUE WAS HERE” -runon kierrättämien puheenpar- sien seurauksia irtisanotun näkökulmasta. Työttömälle ei ole töitä, koska työn- antajat

Updated timetable: Thursday, 7 June 2018 Mini-symposium on Magic squares, prime numbers and postage stamps organized by Ka Lok Chu, Simo Puntanen. &amp;

The researchers involved in this study suggest that children’s own experiences of languages fundamentally affect the way in which they see and make sense of the world, in other

On the other hand, when the future value of collateral fluctuates, the bank’s collateral proceeds from unsuccessful loans (ii.) are risky. If this effect is sufficiently strong,