• Ei tuloksia

7 Multivariate tools to extract information from spectral data

8.7 Data analyses

Several different types of data analyses were applied to the measured spectral data. All the data analyses were done using Matlab ver. 6.5 or 7.0.1. by MathWorks Inc. In two-way analyses Matlab algorithms originally copyright from Eigenvector research were used. Part of the algorithms have been updated to be suitable for newer Matlab versions and also some parameter calculations added by S.-P. Reinikainen. N-Way data analyses were performed using the PLS Toolbox 3.5 by Eigenvector Research Inc..

8.7.1 Analysis of the batch-to-batch variations

The objective of the PARAFAC modeling was to test, whether or not the similarities and dissimilarities beween the batches could be pointed out, and reasons for possible dissimilarities between batches evaluated. In addition, the aim was that the phenomena in the batches in the time scale and spectral variable scale could be distinguished and identified.

PARAFAC analysis was made for the spectral data gathered from sulfathiazole crystallizations to evaluate the batch-to-batch variations in the crystallization experiments. For the validation of the model Core Consistency diagnostics was applied. The model validity was also estimated by examination of how well the model reflected what was already known from the problem.

The obtained spectral data matrices were arranged in the three-way matrix, which represented therefore the structure illustrated in Figure 7. In this study, Mode 1 represented different batches (loading notation APAR), Mode 2 represented time (temperature) (loading notation BPAR) and Mode 3 measured spectral points, i.e., wave numbers (loading notation CPAR). Also other arrangements for the Mode orders were tested, but the abovementioned arrangement was found to give the most reasonable results.

Raw data was used since centering or baselining did not improve the modeling. Also no restrictions were used since as, e.g. non-negativity for different modes separately was tested.

8.7.2 In-situ monitoring of the onset of the crystallization and forming polymorph The objective of this analysis was to monitor the crystallization system prior to the onset of the actual crystallization. A dynamically built PCA model together with MSPC charts were applied to the methodology for real time prediction of the on-set of a crystallization process from ATR-FTIR data gathered from sulfathiazole crystallizations. The alarm system for approaching nucleation was proposed. For this purpose, a two-stage procedure was proposed: 1) The PCA model was derived dynamically from the beginning of the process where the system was unsaturated and the MSPC statistics and 95% confidence limits were calculated for that model.

2) The sampled measured under the supersaturated state were predicted using the model derived in stage 1. In addition, MSPC statistics was calculated. As alarm criteria the 95% confidence limits of T2 and Q statistics were used.

In addition the pseudocolor images of the T2 and Q contributions of the spectral variables are a function of the temperature (time). They were used to illustrate the changes in the process which can be seen from spectral variation as the nucleation approaches, and to see whether or not this variation can be linked to the polymorphic form of forming crystals. To visualize the small contribution changes from one measurement to another, the difference between the 95%

limit of the contributions from the last measured data point and 95% limit of the contributions of the model derived in the calibration stage are calculated and are denoted as dT2lim and dQlim. A description of the proposed methodology is presented in Paper V.

8.7.3 The calibration routine for concentration prediction

In the building of the calibration model for solute concentration prediction from 1) in-situ ATR-FTIR measurements and 2) for the quantification of the polymorphic composition of the sulfathiazole bulk material from the off-line DRIFT-IR measurement a multistep calibration routine was applied. Calibration measurements for solute concentration prediction using ATR-FTIR are presented in Chapter 7.3.1 and in Papers I and III. The calibration and test samples for calibration of the off-line powder samples using DRIFT-IR was simply done by measuring the samples obtained from crystallizations with XRPD and DRIFT-IR. The quantification of the polymorphic composition of the sulfathiazole samples from XRPD patterns was done by assuming that measured XRPD pattern is a linear combination of XRPD patterns of the pure polymorph components. The results from XRPD quantification were used as descriptive variables when building the PLS model for the polymorph composition prediction of the powder samples using the DRIFT-IR technique. This procedure is explained in Papers II and VI.

The calibration routine is closely presented in I and the steps and corresponding multivariate methods used are listed below: The data was centered. The MSPC and sensitivity analyses were applied to evaluate the quality of the samples. Improvement in R2 value and in error of prediction of the calibration and test sets were applied for variable selection. Primary analysis on the spectral knowledge on the importance of the variables was also one main criterion to select the important spectral ranges for multivariate modeling.

OSC filtering methods were applied to preprocess the data. The OSC filtering was selected because its ability of removing Y independent variation from the data. In the studied process conditions Y independent variation can be assumed to exsist, e.g., due to solvent or temperature (solute concentration measurements using ATR-FTIR) or variation related to the particle orientation or size distribution (polymorphic composition measurements using DRIFT-IR).

Therefore, the OSC filtering was an obvious option as an pretreatment method. The predictive PLS model was built and the number of components included in the model was selected based on the RMSEP value of an external test set. An additional validation procedure to validate the PLS model for in-situ solute concentration prediction using ATR-FTIR was applied. Solubility was measured using ATR-FTIR and cthe result is compared to gravimetrically measured solubilities in corresponding solute-solvent system. (Chapter 7.3.1 and Papers I, III, IV) 8.7.4 Off-line classification of crystalline samples

In addition to the quantitative characterization of the samples measured from the sulfathiazole crystalline product based on the off-line DRIFT-IR measurements the multivariate qualitative classification methods were applied. The principle method used was PCA, but also the PCA derived methods SIMCA and MSPC statistics were tested for classification of crystalline bulk samples. The description of the use of these methods are applied in Papers II and VI.

9 RESULTS AND DISCUSSION