• Ei tuloksia

2 methods

2.3 Reliability and Validity of the Study

This is the first time the two instruments were used in the Vietnamese kindergarten context, therefore, the validity and reliability of EvsD and CLASS in the new cultural and linguistic setting were examined.

2.3.1 Reliability of this study

Reliability refers to the consistencies of the instruments (Muijs, 2011). Three types of reliability are: test-retest reliability, internal consistency, and inter-rater reliability. The test-retest reliability is used when variables were measured over time, for example, reading skill is consistent over time. This means that a reliable measurement would give similar scores for an individual’s reading skill in the pre-test and post-test. Inter-nal consistency is a type of reliability that refers to the consistency of people’s response across the items on a multiple-item measurement, e.g., a survey or questionnaire. All the items in a reliable measurement must reflect the same latent construct, therefore, the items are correlated to each other. For example, items from the student engage-ment questionnaire should be correlated to each other. The third type of reliability is inter-rater reliability, which refers to the consistency of different observers/raters/cod-ers’ rating scores (Muijs, 2011). This type of reliability is popular in behavioral meas-urement or observation. For example, when researchers want to measure interactions between teacher and children, they observe or video-tape teacher-child interactions in a classroom, and there can be at least two observers or two people who watch the video and rate the interactions. The rating scores of the two (or more) raters/observ-ers should be correlated to each other. The present dissertation applied two types of reliability: inter-rater reliability, and internal consistency reliability.

In the Sub-study I, the reliability of CLASS, which measures teacher-child interac-tion/classroom quality, was tested by using inter-rater reliability and internal consist-ency reliability. Two different raters watched all the video clips and scored teacher-child interaction/ classroom quality following CLASS manual instructions. One rater was an official and certified CLASS rater, the other rater was trained by the certified CLASS coder. Interrater reliability was measured using intra-class correlation (ICC) with absolute agreement type and a two-way, mixed-effect model. The reliability of CLASS was satisfactory with all ICC coefficients for the dimensions ranging from .70 (regard for student perspective) to .92 (concept development). Inter-rater reliability is presented Table 1 of Sub-study I. The internal consistency reliabilities of three fac-tors were conducted after the factor models of CLASS were examined. The reliability coefficients of the latent domains, which were estimated using Cronbach´s alpha coef-ficients (value range from 0–1, values close to 1 indicate more reliability), ranged from .88 for emotional support to .95 for classroom organization.

In the Sub-study III, the reliability of the EvsD instrument, which measured child classroom engagement and disaffection, was checked by using internal consistency reliability. Overall, Cronbach’s alpha reliabilities of all dimensions were acceptable, ranging from .69 for EE to .85 for Disaffection. The consistency reliability of the EvsD instrument was reported in the measure part of the Sub-study III.

2.3.2 Validity of this study

Validity refers to how well the scores from a measurement represent the variable that they are intended to measure (Muijs, 2011). There are many types of validity: criterion validity (predictive validity, concurrent validity, postdictive validity), face validity, content validity, and construct validity (discriminant validity, convergent validity). In this dissertation, the predictive validity and construct validity were applied. Predictive validity refers to how a measurement predicts an outcome of another measurement, for

example, in this dissertation, how teacher-child interaction and/or classroom quality relate and predict children’s engagement and disaffection in the classroom. Construct validity refers to how well a theoretical concept as a construct transforms into opera-tionalization, in other words, how well the operational definition of a variable reflects the theoretical meaning of the concept and/or idea.

Factorial investigation is one of the broader approaches of validity in evaluating the underlying factor structure of a measurement (Geiser, 2013). One of the commonly used methods to investigate construct validity is confirmatory factor analysis (CFA), which is a part of structural equation modelling. In this dissertation, the CFA was used to examine the construct validities of CLASS in study I and EvsD in Sub-study III. The fits of the CFA models were determined by four absolute and relative fit indices: the standardized root mean square residual (SRMR), the root mean square error of approximation (RMSEA), the Tucker–Lewis index (TLI) and the comparative fit index (CFI). According to Hu and Bentler (1999), values of TLI and CFI are needed to be greater than .90 and values of SRMR and RMSEA are needed to be below .08.

CLASS. Three factor models of CLASS: one-factor model, two-factor model, and three-factor model were tested to find the best fit model for the data. After testing and compar-ing three models, the model with three factors had better model fit to the data than the one-factor and two-factor models. Therefore, the model with three factors (Figure 3) was selected for further analysis. Detailed information about the validity process of CLASS was presented in the measure part and Section 1 of the results of Sub-study I.

Figure 3. Three-factor model of CLASS

Note: Positive climate (PC), negative climate (NC), teacher sensitivity (TS), regard for student perspectives (RSP), behavior management (BM), productivity (PD), instructional learning formats (ILF), concept devel-opment (CD), quality of feedback (QF), and language modelling (LM).

EvsD. Based on the theoretical framework and structural models used in previous studies, three structural models of EvsD: one-factor model, two-factor model and four-factor model were tested. The models were compared by using the Akaike Information Criteria (AIC) difference test for non-nested models and Chi-square difference tests for nested models obtained through Maximum Likelihood Robust (MLR) estimation using log likelihood. Due to the fits of various models and the purposes of each small study, different models were chosen for further analysis. De-tailed information about the validity process of EvsD was presented in the measure part of Sub-study III.

Figure 4. Four-factor model of EVSD

Note: BE = Behavioral Engagement; EE = Emotional Engagement; BD = Behavioral Disaffection; ED = Emo-tional Disaffection.