Stage 4: Exploring the co-design of digital citizen science tools

3.3 Research Approach

3.3.4 Stage 4: Exploring the co-design of digital citizen science tools

The fourth stage aimed at collecting and analyzing data to investigate the link between hu-man values and online participation in a digital citizen science intervention. The context of this stage is set by a case of a year-long local initiative in Lappeenranta, Finland, which involved the co-designing and deployment of digital citizen science tools for environmen-tal monitoring with locals, researchers, community organizations, and decision-makers.

The SENSEI initiative engaged a total of 243 participants, who generated over 100 ideas concerning issues of shared interest, 28 civic tech prototypes, and collected over 300 envi-ronmental observations. Publication IV presents the details of the case study, and chapter 4.2 in this thesis contains the data analysis of the behavioral exploration.

• Procedure and methods: An in-the-wild deployment of SENSEI was studied through the lense of the human values theory to understand online participation in a digital citizen science case. The study was designed to explore human values at three levels: L1) universal, through the use of the Schwartz values instrument

to map the participants’ value orientations; L2) personal, through the analysis of interviews, open surveys, and focus groups; and L3) behavioral, through quantita-tive models such as logistic regressions⁸ and negative binomial regressions⁹based on usage logs. The Bejamini-Hochbergh False Discovery Rate (FDR) procedure (Benjamini and Hochberg, 1995) was applied to control and correct the results from the models.

• Participants: At this stage, the behavior of 85 volunteers was studied, they were either full-time participants or drop-outs (partial participants) in the SENSEI initia-tive. The participants were 7 to 85 years old, who identified themselves as female (37) and male (48). Breaking this down further, there were 44 participants between 25 and 34, and 14 between 35 and 44, showing that young adults make up most of the sample.

• Data analysis: This stage used the quantitative and qualitative data sources to ex-plore human values at three levels: universal, personal, and behavioral (Maio, 2016;

Winter et al., 2018):

– L1: Universal level. In order to map the participants’ value orientations, the responses to the PVQ-21 survey were analyzed (N=85). The incomplete and inconsistent responses (more than 5 missing on the 21 value items, and those who have given the same answer to more than 16 of the 21 value items) (Schwartz, 2016a) were removed through standard quality checks, leading to 83 valid responses. Reliability analysis (Cronbach’s alpha, see table 3.1) was run to evaluate the extent to which the indexes measure each value dimension that underlies all of its items. the value of tradition did not pass the Cronbach alpha cutoff (See Table 3.1), but the value dimension of conservation did. In order to minimize the possible effect tradition may have onto the other values that are part of the dimension (security and conformity). The dimension of conservation is always presented detailing the effect of each value part of it onto the dependent variables¹⁰. The individual scale usage differences were then corrected by calculating the absolute value scores into scores that indicate the relative importance of each value in the individual’s whole value system.

The centered scores were then used for the quantitative models.

8The logistic regression is used to model the probability of certain event existing such as partici-pate/dropout or pass/fail given the particular value of a predictor variable (Sommet and Morselli, 2017)

9The negative binomial regression is a type of generalized linear model in which the dependent variable is a count of the number of times an event occurs(Zwilling, 2013).

10“It is common that some of the internal reliabilities of several PVQ-21 indexes can be relatively low.

Because; 1) the items in the indexes are constructed and selected to cover the different conceptual compo-nents of each value, not to be nearly redundant measures of a narrowly defined concept and, 2) each index includes only two or three items. With so few items it is virtually impossible to obtain high alphas unless the items are very similar to one another. Considering the small number of items used to measure each of the ten values and their necessary heterogeneity, even reliabilities of 0.4 are reasonable”(Schwartz, 2016b)

– L2: Personal level. Qualitative data sources from 15 interviews (total 240 minutes), responses to open questions in online surveys (N = 149), and one focus group session, were analyzed to understand the individual meanings of human values. Thematic analysis was used as the approach to data analysis, which is a “qualitative research method for identifying, analyzing, and re-porting patterns (themes) within the data” (Braun and Clarke, 2006, p.79). It begins with a row-by-row coding process and the outcome is a set of themes that describe the phenomena under study and their relationships. We ana-lyzed responses to questions “Why did you join the environmental monitor-ing initiative?” and “What do you expect from this initiative?”. We gener-ated expectation- and motivation-based codes inductively and used the theory-based codes from the values theory (Schwartz, 2006).

– L3: Behavioral level. The quantitative methods were aimed at understanding the effects of the human value dimensions onto participation in the SENSEI initiative and the use of the SENSEI mobile app. These analyses enabled the author to see how the manifestation of the concrete representation of human values among the participants (L3). Two methods were utilized to explore the relationships between those variables. First, logistic regressions were em-ployed to understand participation types. Second, negative binomial regres-sions were used to understand the effect of these value dimenregres-sions on the use and interactions of participants.

∗ Logistic regression was used to analyze whether a participant stayed as part of the initiative, which is a type of generalized linear model (GLM) that assumes that the dependent variable is binary (Allison, 2009; Os-borne, 2014). The dependent variable is a function of the probability that the predicted variable will be in one of the categories (coded as partici-pation = 1; drop-out = 0). Instead of coefficients, like in linear regres-sion, the effect of independent variables is often reported as conditional probabilities and as odds ratios. Odds ratios enable a comparison of the relative odds of the occurrence of the outcome of interest (e.g., partic-ipation) (Norton and Dowd, 2018). For example, an odds ratio of two would mean that for each increase of an independent variable, the depen-dent variable would be twice as likely to occur. Logistic regression uses the maximum likelihood estimation and is a non-parametric technique. It does not require homoscedasticity, but does require the independence of observations, and independence of errors as well as assumes a linear re-lationship between the dependent and independent variables. For logistic regression, there is no simple, substantively interpreted measure of over-all model fit such asR²(Osborne, 2014). Instead, a chi-square test is used for the overall model significance and the Wald test for the significance of independent variables (Osborne, 2014).

∗ Negative binomial regression was used to analyze the digital interactions.

This model is a type of GLM explicitly designed to model count data (Allison, 2009). The count variable is a specific case of variables that ex-press the number of something, such as the number of interactions, or the increasing number of participants. These variables are always discrete, have values of zero or above, and have often highly skewed distributions.

We selected the negative binomial over Poisson regression to counter the potential effects of over-dispersion. In addition to regression coefficients, the effect of independent variables in negative binomial regression is of-ten reported as Incidence Rate Ratios (IRR). IRR functions in a manner like the odds ratio, thereby reporting the probability of an increase to the dependent count variable. As a GLM, negative binomial regression has similar assumptions and validity testing as logistic regression.

Table 3.1: Participants Values’ Orientations (Pearson’s correlation significance * = 0.01;

**= 0.05). This table shows the significance, mean, standard deviation, and correlation of the participating volunteers’ value dimensions. The correlation coefficients ranged from moderate to strong strength, while moderate standard deviations indicate a healthy spread of values across our participants. The correlations support the circumplex theory structure between values, i.e. the opposing relationship between self-transcendence and enhancement (corr = -.11) and the proximity between openness-to-change and self-transcendence (corr = .45).

Value Dimension Cronbachα Mean Std Dev Correlation 2 SE 3 OTC 4 CON 1. Self-Transcendence .730 5.04 .65 -.113 .451** .307**

2. Self-Enhancement .851 3.67 1.1 - .257* .068

3. Openness to

Change .758 4.65 .71 .257* - -.028

4. Conservation .635 4.02 .72 .068 -.028

In document Participation in Digital Citizen Science (sivua 40-44)