• Ei tuloksia

Constructing algorithmic impacts on equality: A study on fairness and accountability in data-driven education

N/A
N/A
Info
Lataa
Protected

Academic year: 2022

Jaa "Constructing algorithmic impacts on equality: A study on fairness and accountability in data-driven education"

Copied!
64
0
0

Kokoteksti

(1)

Otto Sahlgren

CONSTRUCTING ALGORITHMIC IMPACTS ON EQUALITY

A STUDY ON FAIRNESS AND ACCOUNTABILITY IN DATA-DRIVEN EDUCATION

Faculty of Education and Culture Master’s thesis August 2021

(2)

TIIVISTELMÄ

Otto Sahlgren: Algoritmien yhdenvertaisuusvaikutuksia rakentamassa: Tutkimus reiluudesta ja vastuullisuudesta datavetoisessa kasvatuksessa ja koulutuksessa

Pro gradu -tutkielma Tampereen yliopisto

Elinikäisen oppimisen ja kasvatuksen maisteriohjelma Elokuu 2021

Kasvava tietoisuus koneoppimisteknologioiden negatiivisista yhdenvertaisuusvaikutuksista muodostaa kasvatus- ja koulutusalan toimijoille uudenlaisia vastuita. Legitimoidakseen asemaansa julkisyhteisön ja sidosryhmien silmissä, näiden toimijoiden tarvitsee etenevässä määrin osoittaa, että käytetyt koneoppimiseen perustuvat opetus- ja hallinnolliset teknologiat eivät ole syrjiviä. Tämä vaatii uudenlaisten instrumenttien käyttöönottoa algoritmien vaikutustenarvioinnin suorittamiseksi. Pro gradu -tutkielmani, joka sisältää Taylor &

Francisin julkaisemaan ”Learning, Media and Technology” -journaaliin lähetetyn artikkelikäsikirjoituksen, käsittelee koneoppimisteknologioiden yhdenvertaisuusvaikutustenarviointia tällaisena instrumenttina.

Kirjallisuudessa on esitelty lukuisia formaaleja määritelmiä ’reiluille algoritmeille’ sekä mittareita, joilla pyritään arvioimaan koneoppimisalgoritmien yhdenvertaisuusvaikutuksia. Nämä instrumentit eroavat sen suhteen, kuinka käsitteitä, kuten ’tasa-arvo’ tai ’syrjintä’, operationalisoidaan, joskin lähestymistavoilla on myös metodologisia yhtäläisyyksiä. Tutkielmassa tarkastellaan algoritmien yhdenvertaisuusvaikutustenarvioinnin poliittisia, epistemologisia ja eettisiä ulottuvuuksia ja implikaatioita kytkien havainnot keskusteluun koulutusinstituutioiden vastuusta suhteessa sidosryhmiin sekä niiden roolista yhteiskunnallisen tasa-arvon edistäjinä. Tutkielmassa väitetään, että algoritmisella yhdenvertaisuusvaikutustenarvioinnilla rakennetaan poliittisia, operatiivisia merkityksiä ’algoritmisesta reiluudesta’ koulutuksen ja kasvatuksen kentällä.

Yhdenvertaisuusvaikutustenarvioinnissa konstruoidut operatiiviset merkitykset heijastavat sekä mahdollisesti muuttavat päätöksentekijöiden, kuten kasvatus- ja koulutusinstituutioiden, vastuita ja velvollisuuksia suhteessa sidosryhmiin. Yhdenvertaisuusvaikutustenarvioinnin metodologisilla valinnoilla rakennetaan operatiivisia merkityksiä siitä, mitä vastuita toimijoilla on syrjinnän estämisen ja tasa-arvon edistämisen suhteen.

Tutkielmassa väitetään, että algoritmien yhdenvertaisuusvaikutustenarvioinnin tuottamat operatiiviset merkitykset eivät ole poliittisesti tai eettisesti neutraaleja. Tutkielman keskeinen löydös on, että tietyt metodologiset valinnat saattavat johtaa olemassa olevan yhteiskunnallisen eriarvioisuuden uusintamiseen ja legitimaatioon vastuullisuuden ja yhdenvertaisuuden edistämisen nimissä. Tutkielman havainnot tukevat aikaisempia löydöksiä niin koulutuspolitiikan kuin teknologian etiikan kentiltä.

Avainsanat: yhdenvertaisuus, tekoäly, koulutus, koulutuspolitiikka, vastuullisuus, tasa-arvo Tämän julkaisun alkuperäisyys on tarkastettu Turnitin OriginalityCheck -ohjelmalla.

(3)

ABSTRACT

Otto Sahlgren: Constructing algorithmic impacts on equality: A study on fairness and accountability in data- driven education

Master’s thesis Tampere University

Master’s Programme in Educational Studies, Lifelong Learning, and Education August 2021

As awareness of bias in machine learning applications increases, accountability for technologies and their impact on equality emerges as a novel constituent of accountability in education. To legitimate their actions in the eyes of the public, decision-makers are increasingly searching for ways to evince the equitable and non- discriminatory impact of employed algorithmic technologies. This creates demand for novel instruments for algorithmic accountability. This thesis, which contains an ‘Authors Original Manuscript’ of an article submitted to “Learning, Media and Technology” published by Taylor & Francis Group, discusses assessments of algorithmic impacts on equality as such an instrument.

The literature contains numerous formal definitions and measures for “algorithmic fairness” used to assess algorithmic impacts on equality. They differ in their operationalization of concepts such as ’equality’

and ‘discrimination’, although they share methodological commonalities. The study examines the political, epistemological, and ethical dimensions and implications of algorithmic fairness through the lens of accountability. It is argued that operative political meanings of accountability and fairness are constructed, operationalized, and reciprocally configured in the performance of algorithmic accountability in education.

Operative conceptions regarding agents’ responsibilities in preventing discrimination and in promoting equality are reflected in, and built through, methodological choices in instrumentation.

It is argued that the operative meanings constructed through assessments of algorithmic fairness are neither politically nor ethically neutral. Crucially, certain methodological choices can lead to the reproduction and legitimation of existing social inequalities in the name of accountability. This finding corroborates existing findings in education policy and technology ethics research.

Keywords: algorithmic fairness, accountability, education, equality, education policy

The originality of this publication has been checked with Turnitin OriginalityCheck software.

(4)

CONTENTS

PREFACE ... ERROR! BOOKMARK NOT DEFINED.

1 THE POLITICS AND RECIPROCAL (RE)CONFIGURATION OF ACCOUNTABILITY AND

FAIRNESS IN DATA-DRIVEN EDUCATION ... 8

1.1 Introduction ... 8

1.2 Algorithms, accountability, and their politics in education ... 9

1.2.1 Algorithmic accountability in education... 10

1.2.2 Conceptual frame ... 13

1.2.3 Fair machine learning ... 15

1.3 Politics and abstraction in algorithmic fairness ... 18

1.3.1 Construction of algorithmic fairness ... 18

1.3.2 Abstraction in algorithmic fairness ... 21

1.4 ’Fixing’ inequality with algorithmic fairness ... 23

1.4.1 Technical fixes for broken parts ... 24

1.4.2 Formal equality in conditions of structural inequality ... 25

1.5 Concluding remarks ... 26

2 METHODOLOGY, THEORETICAL FRAMEWORK, AND REFLECTION ... 28

2.1 Aim, background, central claims, and contributions ... 28

2.1.1 Background and central concepts ... 30

2.1.2 Research aim ... 33

2.1.3 Research questions ... 35

2.1.4 Impact and contributions ... 37

2.2 Theoretical framework and methodology ... 41

2.2.1 Research material ... 41

2.2.2 Methodology ... 44

2.2.3 Ontological and epistemological commitments ... 47

2.2.4 Reliability ... 50

2.2.5 Research ethics ... 51

3 CONCLUSIONS AND AVENUES FOR FUTURE RESEARCH ... 54

BIBLIOGRAPHY ... 57

TABLES TABLE 1. A SET OF FAIRNESS DEFINITIONS ... 16

(5)

PREFACE

Machine learning (ML) software and algorithmic systems – commonly dubbed

‘artificial intelligence’, or ‘AI’ for short – increasingly mediate and execute significant pedagogical and administrative tasks in the field of education. Ranging from the scoring of student essays (Greene, 2018), and grading of exams (Adams, Weale & Barr, 2020) to learning group placement decisions (Carmel &

Ben-Shahar, 2017), and drop-out risk prediction (Christie et al., 2019), software and algorithms are used to make important decisions about students’ futures.

Ethical concerns abound, however, as the implementation of such data-driven systems – even if pursued with benevolent intentions of increased effectiveness, equity, and accountability (Luckin et al., 2016; Fontaine, 2016) – has proven prone to widening existing social divides and exacerbating inequality (see, e.g., Adams, Weale & Barr, 2020; Baker & Hawn, 2021; Carmel & Ben-Shahar 2017).

This master’s thesis concerns the relationship between accountability and fairness in data-driven education. In this study, I examine instruments and frameworks for so-called “algorithmic fairness” or “fair ML” through the lens of accountability. These instruments include metrics for identifying unwanted bias in educational ML applications, and computational methods for enhancing their fairness. The study starts from the observation that formal instruments for algorithmic fairness are emerging as novel instruments for “algorithmic accountability” in education: ways of evincing that algorithmic systems are equitable, non-discriminatory, and treat relevant stakeholders fairly are increasingly being incorporated into practices through which agents in different educational contexts perform accountability. The guiding questions of this study concern the ethical, epistemological, and political dimensions of algorithmic fairness as an accountability instrument. The study is analytical, theoretical, and conceptual in nature, and draws from work in multiple scientific disciplines, including critical educational sociology, education policy, technology ethics, computer science, and moral and political philosophy.

(6)

The core arguments advanced in this study can be stated as follows. Firstly, echoing recent work in AI ethics, which conceptualizes algorithmic impact assessments as instruments that construct what relevant “algorithmic impacts”

mean (Metcalf et al., 2021), I argue that algorithmic fairness analyses are to be understood as political, value-laden accountability instruments that construct operative meanings of (in)equality and (un)fairness. In other words, assessments of algorithmic impacts on equality in educational contexts are not “objective” or

“neutral” instruments for quantification but, rather, as ways of seeing, construct operative meanings of equality and fairness in data-driven education. Building on this notion, I further argue that algorithmic fairness as a political instrument plays a part in the (re)configuration of accountability relationships in education. This manifests in data collection, documentation practices, and interventions that serve the performance of accountability, for example. More broadly, however, fairness analyses reflect, legitimize, and possibly even shape, operative conceptions of the responsibilities different agents have with respect to (in)equalities found in educational use-contexts of algorithmic systems. A key finding of my study is that specific frameworks and instruments for algorithmic fairness – namely, formal equality metrics and intervention methods focused narrowly on technologies – risk reproducing and legitimizing extant substantive inequalities.

The contents of this thesis are structured as follows. Section 1 contains an ‘Authors Original Manuscript’ of an article submitted to “Learning, Media and Technology” published by Taylor & Francis Group (available online: https://www.tandfonline.com/toc/cjem20/current). As I am writing this preface, the article manuscript has undergone the first round of reviews and has been accepted for publication with minor revisions. The Authors Original Manuscript has been slightly adapted from the version initially submitted to the journal: Firstly, the references in the version of the manuscript included here have been modified to conform to the APA7 standards. And, secondly, small changes have been made to the text to increase readability when considered together with other sections of this thesis. The abstract has been removed and some transitions have been added into the text, for example. Section 2 provides an in-depth discussion on the methodology and theoretical framework of my study. The section is meant to complement Section 1 by providing further background on the

(7)

theoretical lenses and concepts used in the study, and by clarifying my views about the relationship between descriptive and normative questions regarding algorithmic fairness. Furthermore, the section includes reflection about research ethics and the scientific contributions of the study. The conclusions of my study and avenues for future research are restated and further discussed in section 3.

Some acknowledgements are due. Firstly, I am thoroughly indebted to the supervisor of my thesis, professor Nelli Piattoeva, whose substantial expertise, insights, and feedback were integral to how the thesis took shape. Truly, her continuing encouragement and support throughout the writing process were of immense value. I would also like to thank professor Arto Laitinen, whose teachings and insights in all matters philosophical inspire much of my research, and whose influence on my thinking surely reverberates throughout the pages of this thesis. Lastly, I want to express my gratitude to my colleagues in the FairMatch project (“Fairness in Algorithmic Social Matching”) funded by the Academy of Finland – Thomas Olsson, Kostas Stefanidis, Rodrigo Borges, and Sami Koivumäki – who were kind enough to give comments on drafts of my article manuscript, and from whom I have learned so much during our collaborative efforts.

(8)

1 THE POLITICS AND RECIPROCAL (RE)CONFIGURATION OF

ACCOUNTABILITY AND FAIRNESS IN DATA-DRIVEN EDUCATION 1

1.1 Introduction

Promises of ‘flexible, inclusive, personalized, engaging, and effective’ data-driven education (Luckin et al. 2016, 18), and increased equity and accountability (see Fontaine 2016) notwithstanding, it has become clear that data-driven technologies and predictive modelling involve significant risks of algorithmic discrimination and unfairness in pedagogical practices and admissions (Carmel

& Ben-Shahar, 2017; Zeide, 2020; Adams, Weale, and Barr, 2020). Digging through troves of student and learning data, machine learning (ML) systems can pick up on more or less subtle patterns of discrimination and structural inequality in educational domains and reproduce or amplify them.

Under increasingly closer scrutiny, schools and policymakers search for ways to prove that their algorithms are effective, equitable, and trustworthy (see Zeide, 2020), creating a market for applicable instruments to this end, respectively. What is called algorithmic accountability (see Binns, 2018;

Wieringa, 2020) thereby emerges as a nascent constituent of accountability relationships between educational institutions and the public.

Drawing on the notion that policies can be fruitfully understood by examining their instruments (Lascoumes & Le Galès, 2007), I will critically examine algorithmic fairness as an emerging, political instrument of algorithmic accountability in education. Fair ML research has developed various measures

1 This section contains an ‘Authors Original Manuscript’ of an article submitted to “Learning, Media and Technology” published by Taylor & Francis Group, available online: https://www.tandfonline.com/toc/cjem20/current.

(9)

for detecting wrongful, discriminatory biases in ML systems, as well as technical fixes for mitigating them. These measures reflect different ethical views about fairness and equality in algorithmically mediated practices (Narayanan, 2018).

Respectively, I argue that algorithmic fairness as performance constructs operative, political meanings of fairness and equality in data-driven education.

This, in turn, configures accountability relationships by demarcating which inequalities educational institutions are responsible for mitigating, for example.

Furthermore, it is argued that the dominant methodology of fair ML may incentivize narrow interventions while legitimizing and depoliticizing substantive inequality. To substantiate these claims, I draw on critical work in both AI ethics and studies on the politics of accountability in education. The moral philosophical question regarding what fairness requires is beyond the present scope. Rather, at issue here is a set of intertwining epistemological, and political questions: how and why do algorithmic fairness analyses construct meanings of fairness and inequality, and how does this configure accountability relationships between educational institutions and the public?

Section 1.2 starts with a discussion on the politics of (algorithmic) accountability in education, focusing on data-driven instruments and quantification of inequality, in particular. This is followed by an overview of fair ML. The section anticipates the claim that formalist fair ML methodologies are attractive instruments for educational institutions and decision-makers keen on demonstrating the equitable impact of employed ML systems. Section 1.3 examines how the construction of algorithmic fairness analyses involves, and can conceal, legally and morally significant decision-making. In section 1.4, I argue that some approaches to algorithmic fairness risk legitimating existing substantive inequalities and may disincentivize reforms that would promote substantive equality, all in the name of nominal fairness and accountability.

Section 1.5 summarizes the study’s findings.

1.2 Algorithms, accountability, and their politics in education

Accountability can be roughly characterized as ‘a relationship between an actor and a forum, in which the actor has an obligation to explain and to justify his or her conduct, the forum can pose questions and pass judgement, and the actor

(10)

may face consequences’ (Bovens, 2007, 450; italics removed). This article takes such relationships as inherently socio-political qua constructed and consolidated in and by networks of governance; continuously constructed, distributed, and stabilized through various practices of knowledge production and dissemination, evaluation, and performative practices by public agents (see Ozga, 2013; Ozga, 2019).

With digital reform and increasing procurement and implementation of data- driven technologies for education, a new socio-technical layer emerges into accountability relationships: accountability for automated tools, predictive algorithms, and their impact (see Zeide, 2020; Wieringa, 2020). Performance of algorithmic accountability is defined here as the second-order use of evidence regarding and/or produced by data-driven technologies used for first-order purposes, first-order use referring here to the use of predictive models in tasks such as grading, admissions, and grouping, respectively. Evidence of and for accountability is constructed through instruments, and the study of the latter can reveal politically significant assumptions behind the accountability relationships (re)constructed through instrumentation itself: ‘every [policy] instrument constitutes a condensed form of knowledge about social control and ways of exercising it’ (Lascoumes & Le Galès, 2007, 1).

1.2.1 Algorithmic accountability in education

Discussion on accountability in education commonly revolves around performance reporting, comparative assessments and (self-)evaluations;

schools’ effective and equitable delivery of the services they produce; and questions of political legitimacy and stakeholder fairness (see Levin, 1974;

Fontaine, 2016). So-called high-stakes accountability in increasingly decentralized education and governance, in particular, is characterized by a technical-managerial focus on compliance and performative conformity to principles of good governance through the use of standardized and data- intensive policy instruments, as well as neoliberal market logics (see Diamond &

Spillane, 2004; Fontaine, 2016; Ozga, 2013; Ozga, 2019; Verger, Fontdevila, and Parcerisa, 2019). Global policy instruments, such as national large-scale assessments and test-based accountability systems are used to produce

(11)

information about public sector performance at a distance (see Scott [2000]), while schools are effectively being rendered into ‘service providers who must navigate a competitive choice marketplace as they vie for students/customers’, which in turn requires ‘distinguishing their market niche and by using data to demonstrate their effectiveness’ (Fontaine, 2016, 8).

Data-driven decision-making, in particular, promises effective, equitable, and more accountable pedagogy and administration (Luckin et al., 2016;

Fontaine, 2016, 3). These promises exemplify how accountability policies, practices, and discourse tend to privilege ‘quantification and statistical analysis as ways of knowing’ (Fontaine, 2016, 2). As statistical knowledge invokes a sense of objectivity (Desrosières, 2001; Williamson & Piattoeva, 2019), it contributes to the (re)production of political legitimacy of educational institutions by evincing the effectiveness and equitable impact of their services through numbers. The demand for data-for-accountability, in turn, creates and consolidates a market for data-based policy and regulatory instruments competing for the status of best evidence in this respect (Ozga, 2019, 3–4; Williamson & Piattoeva, 2019).

Winners on this market are instruments that generate ‘simplified, comparative, normative, [and] transportable’ knowledge (Ozga, 2019, 8) which can be produced locally and compared globally.

As awareness of biased algorithms increases, visions of effective, equitable, and accountable data-driven education are questioned, however. Trained on historical data, ML algorithms may infer (multiple) proxies for legally protected or otherwise sensitive attributes, such as ‘race’ or socio-economic status, or pick up on patterns of social inequality, consequently introducing disparities in the statistical models’ predictions (see Baker & Hawn 2021). Thus, discriminatory, or otherwise unfair outcomes may occur without explicit use of legally protected attributes as model features. (This approach is typically dubbed fairness through unawareness.) For example, Ofqual’s infamous A-level exams grade standardization algorithm was designed to match historical distributions in grades. It downgraded almost 40 per cent of students’ expected grades, consequently privileging students from privately funded schools and disadvantaging low socio-economic status, Black, Asian, and Minority Ethnic students (Adams, Weale, and Barr, 2020; Wachter, Mittelstadt, and Russell, forthcoming). While Ofqual’s algorithm was not taught using ML techniques, the

(12)

case highlighted how optimizing algorithms for predictive accuracy on historical data can effectively reproduce rather than address existing inequality, often even without the presence of technical bias, such as measurement errors. Concerns of algorithmic bias have been raised in relation to data-driven ability-grouping and tracking (Carmel & Ben-Shahar, 2017) and automated essay scoring (Greene, 2018) as well.

Educators and policymakers search for ways to put algorithms in check.

Public agencies are increasingly called to conduct self-assessments of “existing and proposed automated decision systems, evaluating potential impacts on fairness, justice, bias, or other concerns across affected communities” (Reisman et al., 2018, 4). Education scholarship has called for

adopting more rigorous and documented procedures for edtech procurement, implementation, and evaluation; requiring vendors to offer sufficient access to underlying data and information about outcomes; […]

[and] sufficient mechanisms for transparency about individual decisions and access to outcomes for relevant populations and protected classes to enable ongoing review. (Zeide, 2020, 802)

Evidence of algorithmic fairness, such as lack of data and model bias, is understood as essential when schools are looking to procure ML-based technologies from private vendors (The Institute for Ethical AI in Education, 2021).

Before I move on to consider the conceptual frame of this study, I should note that standardized tests provide a fruitful analogy for the present discussion.

Like algorithmic decision-making systems, standardized tests are used to provide accurate and comparable information for predicting student outcomes, thereby promoting accountability. However, tests and algorithms are themselves to be accounted for: information about their impact is produced to assess them, and to legitimate their use. Accordingly, one might regard test fairness and algorithmic fairness as having similar functions. Prominent differences between the two lie, firstly, in the fact that confidentiality issues and proprietariness of models and software pose barriers to accessing evidence on fairness with respect to algorithmic systems (The Institute for Ethical AI in Education, 2021; Digital

(13)

Promise, 2014). Secondly, some ML systems learn online, meaning that the systems update their models continuously based on new data.

1.2.2 Conceptual frame

Accountability, its performance, and specific policies are taken here as political qua (re)produced in multiple national and global contexts among competing interests, relationships of power, discourses, and social contingencies (Ozga, 2019; Lascoumes & Le Galès, 2007). Accountability policies are the result of multiple stages of normative and contestable decision-making, including social processes of conceptualization, operationalization, implementation, and evaluation (Ozga, 2019, 2–3). Accountability instruments used to perform accountability, likewise, are malleable and contestable, and their implementation contingent on political and institutional factors. Research on instrumentation has shown that there are diverse reasons, rationales, and discourses – of political, economic, and institutional nature – that condition the adoption and selection of policy instruments (Lascoumes & Le Galès, 2007; Capano & Lippi, 2017). For example, though national large-scale assessments and test-based accountability systems have been globally adopted, (motivations for) their actual uses are

‘contingent to the specificities of the political and institutional settings where they are embedded’ (Verger, Fontdevila, and Parcerisa, 2019, 249).

Depoliticized conceptions of instrumentation persist, however, as

‘instruments of data collection and analysis […] invoke scientific and technical rationality, which combine to support their legitimacy and neutralize their political significance’ (Ozga, 2019, 5). The dominant statistical way of seeing – a sort of metric realism (Desrosières, 2001) – is taken to capture phenomena as if independent from observation and the processes that (re)produce them.

Consequently, the politics of instrumentation are rendered invisible, enabling ‘a technocratic elite to accrue power and influence in education, to profit from its provision and to pursue political agendas that appear to be neutral, objective and necessary’ (Ozga, 2019, 9).

A core notion of this study is that operative forms of accountability and instruments for its performance mutually configure each other. Firstly, regard for accountability and good governance configures concrete practices of data

(14)

collection, interpretation, and use, as is broadly exemplified by the datafication of education (Jarke & Breiter, 2019; Sellar, 2015). Local effects can be found in how schools mobilize data for high-stakes accountability purposes where data are used to identify “broken parts” in educational practices, and to bring learners on the “curb of proficiency” above thresholds specified by accountability policies (Diamond & Spillane, 2004; Datnow & Park, 2018). The pressures of high-stakes accountability systems have also resulted in efforts to “game the system” through different forms of manipulating data collection (Fontaine, 2016, 5–6).

Secondly, metrics that serve the performance of accountability actively (re)configure the meaning of accountability, and other adjacent performative activities. A study on fairness measures in German school monitoring and management showed that

once particular indicators or social indices become ‘fixed’ and thus objectified as representations of inequality, a growing number of continuous monitoring and management practices may become affected by these indicators, such as practices of school performance comparison or accountability measures.

[…] [A]s schools become increasingly surveilled through the use of (meta)data […] social indices can become a powerful form of such (meta)data, which ultimately inform multiple sorting, classification and ordering practices, and thus stabilize the (datafied) inequality between schools. (Hartong & Breiter, 2020, 58.)

Data-driven instruments for accountability thus have a two-fold function of producing knowledge for accountants’ needs but also actively constructing the measured phenomenon itself, which then orients future practice. They configure, but are also configured by, existing relationships between actors, relations of power, and social practices. Indeed, metrics and data as policy instruments create and distribute meanings, and legitimate action: they both construct policy problems and frame solutions to them (Ozga, 2019, 4). However, the ‘instruments at work are not neutral devices: they produce specific effects, independently of the objective pursued’ (Lascoumes and Le Galès, 2007, 1); they restructure adjacent practices, such as data collection. This dynamic interplay between, and co-construction of, operative meanings of fairness, on the one hand, and socio-

(15)

technical practices that perform accountability, on the other, is what I term the reciprocal (re)configuration of (algorithmic) accountability and fairness.

Second-order evidence for algorithmic accountability – i.e., data from and on the data-driven systems themselves – is, crucially, both a technology of governance and control, and an instrument for (re)producing political legitimacy.

(Numeric) data make the invisible visible to the public through mechanisms of transparency, thereby enabling control; by opening up ways for networks of actors to understand practices, activities, and themselves; and by positioning actors within those networks and relations of power (Piattoeva, 2015). Algorithmic accountability instruments function similarly: “black box” algorithms, their effects, and limitations, are opened and exposed to citizens to generate trust in governance and its technologies. In this sense, second-order data effectively gain and exercise power in processes of governing algorithms, meanwhile positioning actors, and configuring networks of responsibility.

I emphasize these issues because, while the mantra “technology is political”

is repeated ad nauseam, the politics of instruments used for evaluating, managing, and legitimating their impact have been largely neglected. To place algorithmic fairness under critical scrutiny as a political instrument (see Narayanan, 2018; Wong, 2019), the socio-technical constituents of fairness analyses must be considered to highlight stages of normative decision-making involving dynamics of inclusion and exclusion. These will be discussed below.

First, a brief overview of fair ML is necessary.

1.2.3 Fair machine learning

As noted, algorithms that display high predictive accuracy on historical data can reproduce discriminatory patterns or social inequality even in the absence of measurement error. Fair ML research has developed formal frameworks and technical tools to ensure fair treatment and outcomes in the use of ML systems.

Work in this field has proposed over 20 formal definitions for fairness (see Narayanan, 2018) translated into measures for fair algorithms. Fairness metrics have a two-fold function: Firstly, designed to capture distinct philosophical and legal concepts of (non-)discrimination and equality, they function as idealized, formal measures for fairness in algorithms. Existing metrics bear similarities to

(16)

those discussed in research on test-fairness in education (see Camilli, 2013;

Hutchinson & Mitchell, 2019)2. Secondly, by indicating deviations from the ideal – namely, unwanted bias – they inform decision-makers about the need for interventions. Existing methods can mitigate technical bias introduced through measurement errors, for example, or bias that accurately reflects social inequalities, by transforming the composition of the data used to train the algorithm (see Kamiran & Calders, 2012), adjusting or constraining the algorithm (e.g., Kamishima et al., 2012; Zhang, Lemoine, and Mitchell, 2018), or balancing the output distribution (e.g., Kamiran, Karim, and Zhang, 2012; Hardt, Price, and Srebro, 2016).

Most fairness definitions balance certain statistical metrics in the predictive model, such as predicted positive outcomes or error rates, across the compared groups. Given the abundance of definitions, I mention only few notable ones to provide a general picture of what is discussed here (table 1). Note that some of these metrics do not take actual outcomes into account – i.e., they do not condition the fairness measure on the actual training data distribution.

TABLE 1. A set of fairness definitions. A binary prediction/decision is denoted with D = {0, 1} and the actual outcomes are denoted with Y = {0, 1}. A probability score s ∈ S determines the value of D for each individual based on a certain decision threshold in the predictive model. Model features are denoted with V, a set of non-protected attributes is denoted with X V, and membership of a protected group is denoted with G = {a, b}.

Definition Description Formal definition

Fairness through

unawareness Model features do not include protected

attributes. G ∉ V

Statistical parity (e.g., Dwork et al., 2012)

Members of protected groups have equal probability of belonging to the class of predicted positive outcomes.

P (D = 1 | a) = P (D = 1 | b)

Calibration (Chouldechova, 2017)

Probability scores estimate underlying

population values accurately. P (Y = 1 | s, a) = P (Y = 1 | s, b)

2 However, while differential item functioning (DIF) is extensively discussed in the latter literature, it has been largely neglected in fair ML discussions (Mitchell & Hutchinson, 2019).

(17)

Equalized odds (Hardt, Price, and Srebro, 2016)

Protected groups have equal error rates (i.e., false negative rates and false positive rates).

P (D = 1 | Y, a) = P (D = 1 | Y, b)

Fairness through awareness (Dwork et al., 2012)

Similar data profiles are treated

similarly. [Requires defining a similarity

metric, e.g. Euclidean distance]

To illustrate, consider the case of automated grading. Statistical parity would be satisfied when members of compared groups are equally likely to be predicted to pass (or fail) a test. When the probability scores (e.g., predicted test scores) generated by an algorithm accurately estimate underlying population values (e.g., learners’ proficiency on the test) the algorithm can be understood as calibrated.

Equalized odds calls for equal error rates, meaning that students should have equal probability of receiving the wrong grade irrespective of their protected attributes. In addition to these “group fairness” metrics, fairness through awareness would require that across all pairs of students, the generated probability scores should differ only to the extent that those individuals differ from each other in their non-protected attributes (e.g., test performance).

An on-going debate concerns the proper measure of algorithmic fairness, and the stakes of this debate have been raised by evidence of inescapable trade- offs. Fairness typically comes with a cost to predictive accuracy (Dwork et al., 2012), and an impossibility theorem shows that algorithms cannot satisfy certain fairness definitions simultaneously, except in highly unlikely circumstances (Kleinberg, Mullainathan, and Raghavan, 2017). Trade-offs have been identified and discussed also in test-fairness research (see Camilli, 2013).

The next section dives deeper into fairness analyses, discussing how meanings of fairness and (in)equality are constructed through methodological choices regarding their respective frameworks. Drawing on existing work, I highlight some prominent gaps pertaining to the dominant formalist fairness methodologies in general and discuss their implications.

(18)

1.3 Politics and abstraction in algorithmic fairness

Acceptance of, and reliance on, data-driven regulation tools are created in networks of power relations and agents who require ‘simplified, comparative, normative, [and] transportable’ knowledge (Ozga, 2019, 8). Dominant frameworks for algorithmic fairness present themselves as viable accountability instruments in these respects. There is, however, a politics to the design of fairness analyses, metrics, and interventions. Abstraction, interpretation and sense-making, and value-laden decisions are involved. These have legal and ethical implications relating to what is included into, and excluded from, consideration of fairness. Methodological choices effectively come to construct an operationalized conception of (in)equality at stake which will guide interventions and structure future practice, including decision-makers’ envisioned responsibilities. In practice, these choices can, of course, be constrained by applicable legislation in a given context (e.g., non-discrimination legislation) and decision-makers’ access to data, for example. The insights offered by the following analysis are valuable regardless of these possible contingencies.

1.3.1 Construction of algorithmic fairness

Building an algorithmic decision-making system for a given purpose involves translation of abstract goals and objectives into computationally tractable problems. Construction of analyses used to assess such systems involve similar work. I will next substantiate this issue by considering components of algorithmic fairness analyses.

Selecting and defining comparison classes. In practice, algorithms cannot be audited for fairness across all possible (sub)groups, meaning certain groups must be prioritized in selecting the relevant comparison classes for fairness analyses. Protected groups specified in non-discrimination legislation, such as

‘race’ and gender, are common choices, although other factors, such as urbanicity or socio-economic status, may be accounted for (see Camilli, 2013).

Furthermore, the results of fairness analyses depend on how categories are selected and constructed; on how ‘racial categories’ are abstracted ‘into a mathematically comparable form’, for example (Hanna et al., 2020, 508).

(19)

Variance in how this is done can affect the comparability of information produced with fairness analyses across different educational institutions and contexts.

Selecting a fairness metric. The objectivity of policy instruments is produced via a threefold translation: first, translation of ‘scientific expertise into standardized and enumerable definitions’; second, translation of these standards into ‘concrete technologies of measurement’; and third, translation of the ‘data produced through measurement technologies into objective policy-relevant knowledge’ (Williamson & Piattoeva, 2019, 64). Likewise, fairness metrics comprise reductive, allegedly domain-neutral formulae which translate legal and ethical expertise into the language of algorithms. They are represented computational implementations of stabilized anti-discrimination norms, which in turn figures into the social production of their objectivity and legitimacy. The relevant norms might also include de facto domain standards, such as those for test fairness in education. Standardized tests, for example, are measured for fairness by examining whether they ‘produce the same score for two test-takers who are of the same proficiency level’ (Hamilton, Stecher, and Klein, 2002, xvi) and whether the predictive accuracy of the test differs across different groups (Ibid., 69–70). The first notion corresponds to individual fairness and the latter most closely to calibration. On a broader understanding of model accuracy, encompassing not only accurate but also false predictions, the latter requirement reflects equalized odds3.

Fairness definitions are not neutral among competing interests and ethical conceptions of fairness (Narayanan, 2018). Metrics that neglect actual outcomes, such as statistical parity, can be met even when systems’ predictions systematically err (Dwork et al., 2012), effectively uncoupling distributive fairness from perceived ‘merit’. Fairness as calibration (Chouldechova, 2017) reflects one notion of formal equality; the same rules ought to apply to everyone regardless of their ‘race’, gender, or other sensitive attributes. As a calibrated model’s predictions mean the same thing across compared groups and probability scores, it makes the system more trustworthy from the perspective of the user. Equalized odds (Hardt, Price, and Srebro, 2016), in turn, prioritizes balance across groups

3 As predictive algorithms, such as those used for automated grading, typically produce the relevant outcomes, ground truth data (e.g., grades a teacher would have assigned) can be unavailable and, hence, data on false predictions cannot be used in algorithmic fairness analyses.

(20)

in terms of risks for misprediction; fairness is understood as equal risk for misallocation or misrecognition of ‘merit’.

Legitimizing disparities. A predictive algorithm that treats everyone equally in the strictest sense fails to discriminate between the targets, i.e., attributes of interest. Accordingly, fairness measurements are typically conditioned on some set of model attributes. For example, statistical parity can be conditioned on a set of ‘legitimate’ variables allowed to introduce deviations from a strict balance in groups’ representation in the predicted outcome class (conditional statistical parity in Corbett-Davies et al., 2017). Drawing lines between “legitimate” and

“illegitimate” proxies for protected attributes in predictive models, decision- makers theorize their responsibilities regarding educational (in)equality.

Specifying “legitimate” sources of disparity in outcomes, they effectively draw a line between what learners themselves are to be held accountable for, and what (possibly latent) sources of inequality decision-makers ought to address. A stance on what counts as a matter of ‘race’, gender, or disability, etc., is taken.

Note, here, the echoes of both a fundamental question of egalitarianism – what kinds of attributes may rightfully introduce deviations from strict equality of outcome, if any? – and the related debate regarding the scope of schools’

responsibilities in mitigating the influence of learners’ childhood environments on their future prospects. Even if schools have a minimal duty of mitigating the

‘unequalizing impact’ of the former on the latter (Coleman, 1975, 29), distinguishing legitimate “effort”, “merit” or “ability” from latent socio-economic variables that covary with protected attributes remains a complex, high-stakes moral question.

Trade-offs. Given well-known “achievement gaps” between racial and gender groups, trade-offs are likely to occur when attainment data is used in algorithmic decision-making in educational domains. The Ofqual case showed that optimizing algorithms for predictive accuracy on structurally biased data prioritizes model accuracy over group fairness. Conversely, increasing the latter can reduce the former (see Dwork et al., 2012). Another highly discussed trade- off is that between calibration and equalized odds: when target attribute prevalence differs across groups, both metrics cannot be satisfied simultaneously, except in highly constrained cases (Kleinberg, Mullainathan, and Raghavan, 2017). An apparent dilemma follows: either “level down” predictive

(21)

accuracy or sacrifice fairness. To the extent that this is an actual dilemma (see Green & Hu, 2018), decision-makers must prioritize certain values over others, bringing questions of power and public accountability ever more strongly into the picture.

Lastly, consider how everything discussed above is further complicated by opaque, asymmetric relationships between private vendors of educational ML technology and agents in the educational domain. Schools and policymakers can be reliant on assessments conducted by model developers and private vendors, whose methodologies, analysis results, and prioritization decisions regarding the eventual model design, may not be disclosed. Furthermore, so-called “fairness washing” used to increase ML products’ attractiveness in the market creates further concerns. Private vendors may conduct fairness analyses against metrics of their choosing, allowing them ‘to claim a certificate of fairness’ while simultaneously shielding them from claims of immoral conduct as they continue

‘weaponizing half-baked [automated] tools’ (Fazelpour & Lipton, 2020, 61). The question of the degree of influence private vendors have in these respects is deserving of its own study.

1.3.2 Abstraction in algorithmic fairness

When concepts from ethics and law are translated into algorithmic logics, contestable concepts are ‘stabilized as standardized and enumerable categories that can be transported to new sites of practice and coded into new assessment services, metrics, and technology products’, to quote Williamson and Piattoeva (2019, 73). Things are always lost in translation, as it were, and abstractions draw boundaries between alternative ways of knowing. When it comes to matters of equality, this can have legally and morally significant consequences.

Fair ML methodologies have been criticized for their abstraction of empirical details, dynamics, complexity, and deep socio-ethical issues (Green & Hu, 2018;

Green, 2020; Selbst et al., 2019). Purely formalistic evaluations of fairness may lead to ‘ineffective, inaccurate, and sometimes dangerously misguided’

interventions (Selbst et al., 2019, 59). Fair ML frameworks have been primarily concerned with comparative justice, for example, and consequently have failed to account for non-comparative justice, such as procedural justice (Selbst et al.,

(22)

2019). The prevalent formalistic methodology also shows a neglect for realized impact (ibid.): when fairness is assessed only in terms of statistical metrics, the realized outcomes and human costs are neglected. These costs may be different in terms of relative significance, which may bear on considerations of fairness (Hellman, 2020). The nature of realized outcomes also depends on factors external to the model, which remain uncaptured by dominant frameworks.

Notably, realization of outcomes also depends on whether and when an algorithm’s predictions are trusted and enacted by human users.

Consider how these abstracted factors could figure into considerations of fairness in algorithmic ability-grouping, where data on students’ attainments are used to predict their success and to place them into either a low- or advanced group, respectively (see Carmel & Ben-Shahar, 2017). First, placement predictions have different human costs here: Accurate predictions place students into groups according to ‘merit’ (as measured with attainment standards). A false positive prediction places a student into an ability-group where they are likely to fail. A false negative unduly denies one’s access to a suitable group, respectively.

The possible human costs range beyond the students, as teachers will bear the burden of accommodating for incorrect placement decisions in their teaching. To the extent that errors are unavoidable, prioritization decisions need to be made with due regard for these costs. Secondly, whether the outcomes of ability- grouping are to be considered fair for students is plausibly dependent on what happens within the established ability-groups. Say, if lower groups receive less resources than higher groups, assignment into a lower group could be regarded as unfair regardless of whether the placement decisions are statistically

‘accurate’. Indeed, research on ability-grouping suggests that relevant effects on learning and equity are constituted not by the grouping practice as such, but by contextual factors and processes that take place within schools, and the composed tracks and groups (see Boaler, Wiliam, and Brown, 2000; Eder, 1981;

Felouzis & Charmillot, 2013). Thirdly, whether human decision-makers such as teachers will enact algorithmic predictions as is, thus producing the fair distribution, is unclear and plausibly contingent on things such as their knowledge regarding the system, and competence in using it (see Selbst et al., 2019; Zeide, 2020). Lastly, quite irrespective of outcomes, transparency and redress mechanisms are arguably essential for procedural justice in this context. The

(23)

opaque nature of the “black box” algorithms can leave few possibilities for students and their families to understand or contest automated decisions, however (see Zeide, 2020).

Now, in decentralized educational governance reliant on local self- evaluations, transportable instruments are in demand, as was noted above. Fair ML metrics and technical fixes seem attractive for accountability purposes in this respect. Indeed, ML products, as with other software, are designed for portability.

Fair ML frameworks reflect such ambitions, and are often proposed as universally applicable for ensuring fairness in algorithmic decision-making (see Green & Hu, 2018; Selbst et al., 2019). Research on test fairness in education has conversely recognized that fairness analyses and metrics are neither neutral nor based on uncontestable assumptions, suggesting that no universal agreement on fairness metrics may come to fruition (see Camilli, 2013; Hutchinson & Mitchell, 2019).

Moreover, in virtue of their ambitions, fair ML frameworks exhibit insensitivity to local concerns and neglect contextual and population variation (Selbst et al., 2019). A given algorithm benchmarked as fair may not lead to similar realizations of social outcomes in different educational settings due to such variance; ‘the local fairness concerns may be different enough that systems do not transfer well between them’ (Ibid., 61). These issues pose a challenge for the credibility of private vendors’ fairness-certified software and their off-the-shelf use across educational settings.

1.4 ‘Fixing’ inequality with algorithmic fairness

The previous sections have demonstrated how any given algorithmic fairness analysis involves epistemological and ethical assumptions. Such analyses may have significant consequences when ML systems are used at scale. This section discusses a few specific issues in this respect. Recall that high-stakes accountability-driven practices of data collection, interpretation, and use in schools focus on identifying problems, comparative performance evaluations, and administrative compliance. While data-driven technologies are in these respects ‘imagined to create new forms of accountability’, the discourse shows

‘little consideration for the longstanding history of current trends’ that questions the link between data utilization and promotion of equality (Fontaine, 2016, 3). I

(24)

argue that algorithmic accountability can suffer from a similar second-order neglect, as it were: without an explicit aim to promote substantive equality, fair ML as an algorithmic accountability instrument can risk discouraging non- technical reforms in educational settings and systems. Paradoxically, technical fixes may end up fixing inequalities only in the sense of solidifying them through political processes of legitimation.

1.4.1 Technical fixes for broken parts

To boost achievement metrics, schools are keen on bringing learners on the “curb of proficiency” above a certain threshold; to fix broken parts, as opposed to conducting extensive reforms which would benefit all students (Diamond &

Spillane, 2004; Datnow & Park, 2018). With its focus on technical as opposed to socio-technical systems, fair ML can promote interventions exemplary of this

“broken part fallacy”. For one, certain technical interventions involve changing the prediction labels (e.g., from “fail” to “pass”) of disadvantaged groups’ members close to the decision-threshold – quite literally on the “curb of proficiency” – to increase fairness (Kamiran, Karim, and Zhang, 2012). More generally, however, the fair ML frame focuses on fixing technology as opposed to social structures and practices, and incentivizes the gathering of more and better data ‘at the risk of missing deeper questions about whether some technologies should be used at all’ (Katell et al., 2020, 45).

Indeed, the technical focus of fair ML can be problematic particularly in its resisting critical reflection of the status quo. As Green (2020) has argued in detail, reforms through data-driven practices, even with an explicit focus on fairness, risk legitimizing structural disadvantage by invoking false objectivity, consequently disincentivizing reforms that would restructure socio-technical systems as opposed to their technical constituents. Similar dynamics have been discovered in relation to data-driven tracking and ability-grouping which ‘continue to abound in schools and are legitimated with data’ (Datnow & Park, 2018, 148), even though the ill effects of such practices – affecting primarily those in positions of existing disadvantage – are widely recognized in scholarship. Fair ML may further legitimate such practices through nominal certifications of fairness, and promote technical interventions even though ‘many meaningful interventions toward

(25)

equitable algorithmic systems are non-technical’ and localized as opposed to scalable (Katell et al., 2020, 45).

1.4.2 Formal equality in conditions of structural inequality

Ideals of formal equality, such as equal treatment and equal opportunity, have long roots in the operative meanings of educational accountability (see Fontaine, 2016). ML systems that satisfy formal equality metrics are attractive from the perspective of high-stakes accountability in this regard. Calibration, for example, ensures that teachers and administrators can both trust their algorithm-supported decisions and evince that every learner is subjected to the same evaluation criteria. Recall, however, that calibration cannot usually be satisfied simultaneously with other formal equality metrics such as equalized odds (Kleinberg, Mullainathan, and Raghavan, 2017). Increasing the overall accuracy of an algorithm by explicitly including protected attributes could resolve the trade- off (at least partly). Decision-makers are likely to find this option unattractive, however, as information about protected attributes would figure into decisions (however, see discussion in Hellman, 2020).

I wish to make no normative argument regarding choice of metric or trade- off resolution here, nor to downplay the related moral and legal tensions. Rather, I note that formal equality in contexts characterized by structural injustice can work against substantive equality which would recognize that students start from different positions of (dis)advantage. As evinced by U.S. education policy, for example, ‘efforts to ignore race via “colorblind” or race-neutrality policies such as school choice or accountability systems can easily replicate rather than address age-old patterns of inequality grounded in a history of race consciousness’ (Wells, 2014, i). An approach to algorithmic fairness that fails to acknowledge how forms of privilege and disadvantage structure educational datasets can have similar effects, respectively. Because metrics such as calibration, and equalized odds condition the ideal, fair balance in statistical metrics across groups on the actual outcomes in the training data, they can be satisfied even when they preserve structural bias (e.g., “achievement gaps”). Such metrics are ‘bias preserving’

because they take the status quo, existing structural injustices and patterned inequalities included, as a neutral baseline for fairness (Wachter, Mittelstadt, and

(26)

Russell, forthcoming). The relevant differences between these metrics pertain to how structural bias is preserved in the predictive model. ‘Bias transforming’

metrics, such as (conditional) statistical parity, on the other hand, cannot be satisfied in conditions of structural inequality even by perfectly accurate algorithms (Ibid.).

Use of bias preserving metrics seems a natural choice within accountability systems that prioritize merit-respecting, equitable treatment because predictive accuracy is constructed as inherent to fairness. The relative merits of these metrics notwithstanding, a decision to use bias preserving metrics is in any case one that effectively takes existing social inequality across compared groups as justified. In other words, bias preserving metrics construct structural injustice as a causal influence on learners’ prospects that agents in the educational domain, or other relevant public entities, have no responsibility to mitigate. Socio-material conditions, such as environmental stressors related to poverty, and structural injustices against historically oppressed groups are ‘disentangled from student performance’ (Fontaine, 2016, 8) in the “fair” model. Bias transforming metrics with their grounding in substantive equality, on the other hand, will indicate the presence of unfairness even when predictions match structurally biased historical distributions in data. As substantive equality is required by, for example EU non- discrimination legislation in certain jurisdictions, these metrics may, notably, be even required for compliance (Wachter, Mittelstadt, and Russell, forthcoming).

1.5 Concluding remarks

I have discussed algorithmic fairness as part of algorithmic accountability in education, critically examining formal methodologies for detecting and mitigating algorithmic bias. In the current market for instruments for algorithmic accountability, the alleged transportability, universality, comparability, and socially produced objectivity of algorithmic fairness analyses allows them to check many boxes that make them attractive to educational institutions keen on demonstrating effectiveness and equitable impact of data-driven educational technologies and legitimating their use in the eyes of the public.

This study has aimed to show that specific meanings of ‘(in)equality’, ‘(non- )discrimination’, and ‘(un)fairness’ in education are constructed through

(27)

algorithmic fairness analyses, reflecting, and actively configuring existing meanings and performance of accountability. Specifically, three claims were made: Firstly, fairness analyses aim to uncover and disclose the impact algorithms have on equity, but much discretionary ethically and politically significant decision-making precedes, underlies, and follows such analyses.

Consequently, transparency regarding mere (absence of) certain biases does not necessarily disclose meaningful aspects of a given system to the public, assuming such transparency can be achieved in the first place given private vendors’ discretion in this regard. Secondly, given the formalistic epistemology of fair ML methodologies, there is a risk that even fairness-conscious adoption of algorithmic systems may result in misguided or misinformed interventions. Lastly, if constrained by, and harnessed for satisfying accountability policies that favour formal equality in conditions of structural injustice, fair ML may disincentivize systemic reforms for substantive equality in education.

Importantly, my claim has not been that algorithmic fairness analyses would be unhelpful or useless; they can surely support risk management and transparency. Rather, the aim has been to critically examine the political, epistemological, and ethical dimensions and implications of such analyses.

Nonetheless, this study’s conclusions support previous work suggesting that, even with their language of excellence and equity and emphasis on informed, data-driven decision-making, certain accountability systems and incentive structures may paradoxically reproduce and legitimize existing inequality. This study has contributed by showing how explicit measures taken to govern and account for data-driven technologies themselves may suffer from a similar second-order tendency. Accordingly, as methods, tools, and policies for governing algorithms in education begin to emerge, scholarship is called to critically examine the ethics and politics that underlie them.

(28)

2. METHODOLOGY, THEORETICAL FRAMEWORK, AND REFLECTION

In this section, I expound upon the methodological, theoretical, and ethical foundations and implications of my study. The study, presented in the form of an article manuscript in the previous section, could not go into these issues in depth due to space constraints. I shall undertake this effort here so as to make certain methodological and ethical choices pertaining to my research explicit, and to further clarify certain positions I have assumed and advanced in the manuscript.

The discussion is structured as follows. In section 2.1, I restate the topic of my thesis. I will contextualize my research, explicate my research questions, and briefly describe my findings. I will also describe the theoretical and practical implications of my research and findings. In section 2.2, I discuss the theoretical framework and methodology of my research. Specifically, I shall explicate the ontological and epistemological positions from which I have conducted my research. To demonstrate the robustness of the applied framework, I will also provide a brief defense of these positions. This serves to provide indirect support for the findings of the study, respectively. The section also includes a discussion on research ethics, where I aim to explicate central ethical questions that pertain to my study. Possibilities for future research are discussed in section 3.

2.1. Aim, background, central claims, and contributions

The topic of my transdisciplinary study lies at the intersection of critical educational sociology, education policy, philosophy, and technology ethics. As noted, the study is motivated by the observation that educational administrators and policymakers are increasingly searching for ways to demonstrate the effectiveness, trustworthiness, and equitable impact of implemented data-driven or algorithmic decision-making systems. In other words, they are searching for what can be called algorithmic accountability instruments that serve the

(29)

assessment and mitigation of risks related to the use of automated pedagogical and administrative tools; that is, instruments and practices that can help legitimize the use of implemented technological tools.

This study can be understood as a critical examination of one set of algorithmic accountability instruments, namely, algorithmic fairness frameworks and tools. These include methods for evincing that implemented algorithmic systems are fair and non-discriminatory, and for improving fairness thereof. The former include metrics and adjacent documentation methods, and the latter include so-called bias mitigation methods used for “eliminating” unwanted bias in algorithmic systems4. Both metrics and mitigation methods have been developed to address prominent issues with algorithmic bias in ML applications.

In the study, it is argued that, as an accountability instrument, algorithmic fairness is political. I do not mean to imply that measurements and interventions related to algorithmic fairness were categorically and de facto subject to political decision-making processes in a perhaps traditional sense of the term5. Rather, I refer to what might be considered a broader, sociological understanding of the political. As I state on page 13, I understand accountability as political “qua (re)produced in multiple national and global contexts among competing interests, relationships of power, discourses, and social contingencies (Ozga, 2019;

Lascoumes & Le Galès, 2007)”. Thus, I am generally interested in how assessments of algorithmic impacts on equality intertwine with power and discourses, and what kinds of ethical and epistemological assumptions underlie them. Indeed, I claim that analyses conducted for the assessment of algorithmic fairness, and interventions taken to promote it, construct meanings of educational (in)equality, configuring accountability relationships between educational institutions and the public. This echoes recent work in AI ethics arguing that

“impacts”, as they occur in algorithmic impact assessments, are evaluative constructs that “enable actors to identify and ameliorate harms experienced because of a policy decision or a system (Metcalf, 2021, 735) – namely, an algorithmic system in the present context.

4 Note that by “algorithmic systems” I refer to ML applications, in particular, although most of what is argued applies equally to more traditional software

5 For a normative argument along these lines see Wong (2019) who argues that algorithmic fairness ought to be a question of democratic deliberation.

Viittaukset

LIITTYVÄT TIEDOSTOT

When looking at the first round, the opportunity to punish responders did not increase points sent or returned, whereas the requirement of a justification had an impact on

A key purpose of corporate accountability mechanisms is to hold managers of organizations accountable for the social, environmental and economic outcomes arising from the actions

The aim of this study was to describe the the- oretical analysis of accountability in the public sector and to form an application for use in municipal social and health care

The Patriarch has described the war in Syria as a “holy war”, but his stand on Ukraine is much more reserved.82 Considering the war in Syria, the main religious argument by the

The Minsk Agreements are unattractive to both Ukraine and Russia, and therefore they will never be implemented, existing sanctions will never be lifted, Rus- sia never leaves,

The ultimate goal of allocating channel is to get the maximum throughput, less delay and more fairness among the nodes. Besides throughput and delay, fairness is also a

In healthcare, the four main ethical issues mentioned throughout various published research are transparency, justice and fairness, accountability and responsibility, and privacy

The opposite view on the issue - theory Y – could have potential to be at least tested by the algorithmic systems designers (platform providers), because, as many in the field