Open machine intelligence in education

(1)

uef.fi

PUBLICATIONS OF

THE UNIVERSITY OF EASTERN FINLAND Dissertations in Forestry and Natural Sciences

ISBN 978-952-61-3477-2 ISSN 1798-5668

Dissertations in Forestry and Natural Sciences

DISSERTATIONS | TAPANI TOIVONEN | OPEN MACHINE INTELLIGENCE IN EDUCATION | No 388

TAPANI TOIVONEN

OPEN MACHINE INTELLIGENCE IN

EDUCATION

PUBLICATIONS OF

THE UNIVERSITY OF EASTERN FINLAND

Advances during this decade in machine intelligence have bootstrapped a demand

for citizens outside of academia and data science to understand, and use the machine

intelligence, and even basic algorithms or models.

This thesis studies how - usually - opaque and complex machine intelligence algorithms,

and models can be used by data science novices in education, when the complexity and opaqueness have been transferred into

explainable, interpretable and transparent counterparts.

TAPANI TOIVONEN

(2)

(3)

PUBLICATIONS OF THE UNIVERSITY OF EASTERN FINLAND DISSERTATIONS IN FORESTRY AND NATURAL SCIENCES

N:o 388

Tapani Toivonen

OPEN MACHINE INTELLIGENCE IN EDUCATION

ACADEMIC DISSERTATION

To be presented by the permission of the Faculty of Science and Forestry for public examination online and at the University of Eastern Finland, Joensuu, on October 16th, 2020, at 15 o’clock.

University of Eastern Finland School of Computing

Joensuu 2020

(4)

Grano Oy Jyväskylä, 2020

Editors: Pertti Pasanen, Matti Tedre, Jukka Tuomela, and Matti Vornanen

Distribution:

University of Eastern Finland Library / Sales of publications julkaisumyynti@uef.fi

http://www.uef.fi/kirjasto

ISBN: 978-952-61-3477-2 (print) ISSNL: 1798-5668

ISSN: 1798-5668 ISBN: 978-952-61-3478-9 (pdf)

ISSNL: 1798-5668 ISSN: 1798-5676

(5)

Author’s address: University of Eastern Finland School of Computing

P.O.Box 111 80110 JOENSUU FINLAND

email: firstname.lastname@uef.fi Supervisors: Professor Markku Tukiainen

University of Eastern Finland School of computing

P.O.Box 111 80110 Joensuu FINLAND

email: firstname.lastname@uef.fi Senior researcher Ilkka Jormanainen University of Eastern Finland School of computing

P.O.Box 111 80110 Joensuu FINLAND

email: firstname.lastname@uef.fi

Reviewers: Associate Professor Miguel Angel Conde Gonzalez University of Leon, Spain

Department of Mechanic Engineering Leon

SPAIN

email: mcong@unileon.es

Chair Professor Nian-Shing Chen

National Yunlin University of Science and Technology Department of Applied Foreign Languages

TAIWAN

email: nianshing@gmail.com Opponent: Associate Professor Ryan Baker

University of Pennsylvania Graduate School of Education Pennsylvania

USA

email: ryanshaunbaker@gmail.com

(6)

(7)

Tapani Toivonen

Open Machine Intelligence in Education Joensuu: University of Eastern Finland, 2020 Publications of the University of Eastern Finland Dissertations in Forestry and Natural Sciences

ABSTRACT

Intelligent systems are everywhere. The rapid changes and advances in intelligent computing affect the way people perceive business, health, social relationships and education. More the intelligent computing affects these domains, more concern the intelligent computing or theMachine Intelligence(MI) rises. People are required to be literate in MI to trust and to use the systems that surround us. This thesis studies methods and technical solutions to introduce MI concepts such as deep learning and clustering to what we call asdata science novicesin educational contexts.

We study the algorithms that can be interpreted by data science novices and the practices where data science novices could interact with interpretable MI models to learn, understand and discover knowledge. The results of this thesis indicate that the transparency of MI models and algorithms supports novices greatly and is sometimes even required to harness the power of MI by novice users.

Universal Decimal Classification:004.421, 004.65, 004.8, 004.85, 37.011.22, 37.091.3 Library of Congress Subject Headings: Artificial intelligence; Computational intelli- gence; Machine learning; Data mining; Education; Educational technology; Robotics; Algo- rithms; Computers and literacy

Yleinen suomalainen ontologia: tekoäly; koneoppiminen; tiedonlouhinta; kasvatusala;

opetusteknologia; robotiikka; algoritmit

(8)

ACKNOWLEDGEMENTS

I wish to thank my supervisors Markku Tukiainen, and Ilkka Jormanainen, who taught, mentored, and supported me during, and before writing this thesis. The words solely cannot describe enough the gratitude I feel about their tireless efforts.

I wish to thank my collegues at the School of Computing, UEF for discussions, possibilities, and support.

But the most, I wish to thank the most beautiful woman in the world - my wife, who supported, nurtured, and loved me during this process. She surely was and is teacher, mentor and sometimes even ultimate opponent.

Finally, to fulfil a promise I made...

I wish to thank Retkitukku for providing me back my binoculars of interest to finish this thesis.

Joensuu, August, 2020 Tapani Toivonen

(9)

LIST OF PUBLICATIONS

This thesis consists of the present review of the author’s work in the field of computer science and the following selection of the author’s publications:

I Toivonen, Tapani, and Ilkka Jormanainen. "Using JS-Eden to introduce the concepts of reinforcement learning and artificial neural networks." In Proceed- ings of the 16th Koli Calling International Conference on Computing Educa- tion Research, pp. 165-169. ACM, 2016. DOI: 10.1145/2999541

II Toivonen, Tapani, Ilkka Jormanainen, and Markku Tukiainen. "An open robotics environment motivates students to learn the key concepts of artificial neural networks and reinforcement learning." In International Conference on Robotics and Education RiE 2017, pp. 317-328. Springer, Cham, 2017. DOI: 10.1007/978- 3-319-62875-2_29

III Toivonen, Tapani, Ilkka Jormanainen, Calkin Suero Montero, and Andrea Alessan- drini. "Innovative maker movement platform for K-12 education as a smart learning environment." In Challenges and Solutions in Smart Learning, pp.

61-66. Springer, Singapore, 2018. DOI: 10.1007/978-981-10-8743-1_9

IV Toivonen, Tapani, Ilkka Jormanainen, and Markku Tukiainen. "Augmented intelligence in educational data mining." Smart Learning Environments 6, no.

1 (2019): 10. Springer, Singapore. DOI: 10.1186/s40561-019-0086-1

V Toivonen, Tapani, and Ilkka Jormanainen. "Evolution of Decision Tree Classi- fiers in Open Ended Educational Data Mining." In Proceedings of the Seventh International Conference on Technological Ecosystems for Enhancing Multi- culturality, pp. 290-296. ACM 2019. DOI: 10.1145/3362789

Throughout the overview, these papers will be referred to by Roman numerals.

AUTHOR’S CONTRIBUTION

1. For Paper I, the system introduced was implemented by the author of this thesis. The literature review was conduced in collaboration with other author.

Both authors were engaged to the writing process and the evaluation of the writing.

2. For Paper II, the research settings and the questionnaires were designed in collaboration between all of the authors. The system used in the research was implemented by the author of this thesis. The literature review was conduced in collaboration with all of the authors. All of the authors were engaged to the writing process and the evaluation of the writing.

3. For Paper III, the technical details of the system introduced in the paper was designed and implemented by the author of this thesis. The algorithm introduced in the paper was also designed by the author of this thesis. All of

(10)

the authors in Paper III collaborated in the literature review and the writing process of the Paper III.

4. For Paper IV, the study settings and the questionnaires were designed in the collaboration between all of the authors of Paper IV. The technical implementations for the study was designed by the author of this thesis. All of the authors in Paper IV collaborated in the literature review and the writing process of the Paper IV.

5. For Paper V, the study settings and were designed in the collaboration between all of the authors. Also, all of the authors in Paper V collaborated in the literature review and the writing process of the Paper V.

(11)

1 INTRODUCTION

Intelligent algorithms are everywhere. They are constantly analyzing the digital trace of Internet users, supporting people’s decision making and helping domain experts to understand the underlying issues in the big data [1]. Intelligent algorithms guide traffic [2] and support medical professionals in diagnosing illnesses [3]. Intelligent algorithms enable vehicles to become autonomous [4] and robots to learn to handle specific tasks that would require human-level intelligence [5]. An intelligent algorithm is a set of computer instructions that learns, plans or otherwise performs tasks in which the use of deterministic, exact and rule-based algorithms is not feasible due to the massive size of the search space rising from the problem definition.

For instance, planninga set of actions from an initial state to the possible goal is well known to be a difficult problem to solve exactly. The problem is actually PSPACE-complete; hence, a feasible and computationally inexpensive calculation of the solution for all instances of the problem might not even exist [6] unless P = PSPACE. An intelligent algorithm, however, can learn or predict the solution with a feasible amount of resources, which is close to the exact and, thus, optimal solution [7].

Intelligent algorithms are studied and used in machine intelligence (MI) [8], which covers a wide range of applications. Such applications include, for instance, machine learning (ML) [9] and evolutionary computing (EC) [10]. The algorithms used in ML can learn to generalize from a given dataset. The generalizations can lead to prediction of future events or clustering the dataset instances into the similar groups. EC is used to handle the difficult problems in which exact algorithms would require a massive amount of resources. EC algorithms resemble mechanisms from evolution theory and natural selection. They can be used, for instance, to solve intractable problems (or the problems in which the best possible solution is unknown), such as the Travelling Salesman Problem, with great accuracy and speed when deriving the exact solution would not be possible due to the input size and the complexity of the problem instance [11]. Figure 1.1 illustrates the taxonomy of intelligent algorithms in the MI domain; Figure 1.2 illustrates the taxonomy of ML, which is a highly used MI approach.

MI methods can be used to analyze educational data in an educational context to deepen the stakeholders’ understanding from the dataset’s context. This understanding can lead to better teaching and learning practices and interventions or even to deeper perceptions of the learning processes [12]. The terms learning analytics (LA) [13] and educational data mining (EDM) [14] are used to describe the analysis and data mining process with educational data. LA and EDM use these MI methods to predict [15], cluster [16], detect outliers [17] or novelties and find association rules [18] from the educational data. The process leads, at its best, to knowledge discovery when something new is learned from the datasets’ context. The platforms used in general data mining are well suited for EDM and LA purposes [19]. However, specific platforms for LA or EDM have also been developed [14].

MI methods are also taught at different education levels. MI has been an integral

(14)

Figure 1.1: Taxonomy of intelligence in computing (retrieved from https://www.sharper.ai/taxonomy-ai/)

part of the curriculum in higher education institutes such as universities for decades, but introductory courses to MI or related subjects have also been taught lately in primary and secondary schools in some locations [20]. Multiple tools exist that can be used to demonstrate and teach MI. The tools have been tailored for different user groups; some of the tools are focused on advanced MI topics; other tools’ aim is to contextualize the very basics [21].

MI methods are usually complex to understand, and the models generated by the algorithms usually work as black boxes, whose input - output pair can only be comprehended by a human [22]. Even then, a domain expert such as a data scientist is usually required to interpret the results given by the model. Research in EDM or LA is focused more on studying how to analyze educational data efficiently and accurately when the end user is a data science specialist, hence omitting the teacher or other end user from the data mining process [23].

We present open-ended (understandable, interpretable, explained and transparent) practices in the research conducted for this thesis that help the end users who are not specialized in MI or a related subject to cope with the complex algorithms and practices used in MI for education. We refer to them as data science novices throughout this thesis. We argue that the selection of tools and simple, interpretable representations of the algorithms and models can lead data science novices to an-

(15)

Figure 1.2: Taxonomy of machine learning (retrieved from https://www.sharper.ai/taxonomy-ai/)

alyze educational data effectively and discover new knowledge from the context during the analysis process. We also argue that students can efficiently learn the MI concepts by using open-ended tool sets and practices that concretize the black box MI algorithm. We studied how students and teachers interact and learn when they face the difficult and complex ideas of MI in order to support these arguments while the tool sets that were used by the teachers and students were designed to be more open in nature. We used algorithms that generated interpretable models to predict and cluster data and extended the black-boxed models to the physical world using mobile robots with the support of real-time visualizations. The findings support our arguments that, in the educational context, the stakeholders benefit from the open- ended approach that allows them to gain deep insight from the complex algorithms and datasets.

We present methods of Open Machine Intelligence in Education (OMIE) in this thesis; OMIE refers to the methodology in which, instead of a black box, the tools and predictive models are transparent, and a subjective understanding of the context is preferred instead of the predictive models’ high objective accuracy. Instead of aiming to generalize in several settings, OMIE aims to deepen the end user’s understanding in a specific context and through that to build a knowledge base that supports the end user to carry out tasks in more general settings in a bottom-up manner. We also argue that evaluating the predictive model’s objective accuracy when the subjective understanding from the context is preferred does not fully capture the power of the tools and algorithms. OMIE places value on the knowledge discovery but lets the end user define how meaningful the knowledge discovery is.

(16)

The main objective of this research is toanalyze the feasibility of OMIE, and the advantages and disadvantages of OMIE.

This thesis is organized as follows. First, we present the research methodology, and the research questions. Second, we show how OMIE can be used to present and teach abstract MI concepts for data science novices. Third we present the research how OMIE can be used in EDM and LA. Finally, we conclude the research and point out future directions and discussion around OMIE.

1.1 RESEARCH QUESTIONS

The research we conducted for this thesis in Papers I – V, introduced in the next subsection, answered three formed research questions. This thesis will introduce how machine intelligence (MI) can be used in educational contexts either for learning or teaching when the platforms and methods used in MI are transparent and open-ended for the end users. By transparent we mean that the outcomes of the predictive models are justified and that the justification is interpretable. By open-ended we mean that the end-user has an access to understand, to modify and to adjust the predictive models. The focus is especially ondata science novices. We address the challenges in MI for those who have no formal education in MI-related subjects and no prior experience in the usage of MI methods. The complex nature of the MI algorithms and models can cause difficulties even for data science specialists and researchers who study these methods, not to mention students and teachers who wish to understand the modern digital world and cope with the different technological solutions that collect and analyze the users’ digital traces.

The research conducted for this thesis and for Papers I - V presents alternative approaches that can be used to deepen the understanding of those in MI who wish to dig deeper in the modern MI-controlled world and understand the complex and abstract concepts of MI. White boxing MI is the key concept in Open Machine In- telligence in Education (OMIE). Complex and opaque models and calculations have been simplified in white boxing and explained through visualizations and practice.

Nature OMIE Traditional MI

Stakeholders Inclusive Exclusive

Algorithms White box Black box

Models If judging then also justifying Judging without justifying

Process Transparent Opaque

Table 1.1: OMIE vs traditional MI

Table 1.1 compares OMIE to the traditional view of MI. The research presented in this thesis and in Papers I - V answers the following three research questions, on which the research was ultimately based.

• RQ1: Is it possible towhite boxintelligent algorithms that are usually viewed as black boxes for education?

• RQ2: What is the benefit ofwhite boxingintelligent algorithms for education?

• RQ3: What are the challenges of white boxing intelligent algorithms for non- experts?

(17)

The objective of this thesis is to study open-ended MI practices to ﬁnd alterna- tives of traditional black box models, algorithms, and exclusive user-groups (data scientist, for instance).

The following sections will introduce how MI and especially machine learning can be efﬁciently used in the context of education in a way that enables even data science novices without prior experience in such concepts to use and understand state-of-the-art digital artefacts and algorithms.

1.2 RESEARCH DESIGN AND METHODOLOGY

The research conducted for this thesis in Papers I - V aimed to study how complex MI concepts could be used by data science novices in an educational context. The hypothesis in the beginning was that the algorithms and the models generated by algorithms should be interpretable and adjustable by a non-expert. Our research ﬁndings support our hypothesis: MI can lead to knowledge discovery when used by non-experts in schools if the users can interpret the models. That is, the model can be visualized and explained intuitively to a non-expert. Such models include, for instance, decision tree classiﬁers and some clustering algorithms such as Neural N-Tree, which are presented as a part of the research for this thesis.

We used a design research framework to frame the research presented in this thesis [24]. Design research aims to develop research artefacts that reﬁne practices in iterative processes in which the artefacts are evaluated through interventions at each iteration cycle. Moreover, developing the research artefacts in the design research should be based on the theoretical grounds of previous research and practices. The evaluation of the research artefacts should lead to a better understanding of the research context and, optimally, to more robust research artefacts in the following iteration cycles of the design research. Design research is a widely used research framework in the ﬁelds of computer science and education [25] but is also applied in other design sciences such as architecture [26]. Figure 1.3 illustrates the process of design research.

Figure 1.3:Design research process (adapted from [135])

The interventions in our research were targeted at two different stakeholders.

The ﬁrst part of the research aimed the interventions at the students; the second part

(18)

Figure 1.4:Design research process applied when developing OMIE

aimed the interventions at education practitioners, such as teachers. Based on the design research iteration cycles in both parts, we developed two different research artefacts. The first artefact is a platform for mobile robots that helps and motivates the students to learn complex ideas of reinforcement learning and artificial neural networks by using real-time visualizations and concrete physical world objects. The artefact from the second part of the research is an open-ended analysis tool for educational data. The artefact was designed to use white box machine learning algorithms that allow powerful visualizations and predictive model adjusting to deepen the end user’s domain knowledge. Figure 1.4 illustrates how we applied design research for this thesis. Figure 1.5 illustrates how the first and the second part of the research relate to OMIE.

The artefacts developed and designed in the design research process were based on the known theory but were also improved after the interventions. We conducted a literature review during which we studied what the gaps are in bringing MI to data science novices in the schooling realm so we could capture the theoretical founda- tions of the current state-of-the-art practices in the uses of MI in education. The following list gives an overview of how each paper in Paper I - V relates to OMIE and what was done for each part of the research for this thesis.

1. In Paper I, we conducted a literature review and implemented a system using the Empirical Modelling (EM) platform that would allow artiﬁcial neural networks (ANNs) to control a mobile robot. The aim was to implement the system in a way that it would allow visualizations of the networks’ state and control the robot in real time. The aim was that the university students who use the system would gain a deeper insight into the ANNs and machine learning (ML) concepts.

(19)

Figure 1.5:Venn diagram of the concepts in this thesis

2. Paper II continued the work conducted in Paper I. Besides a literature review in which the focus was to understand how ANNs were used in a learning context, we evaluated the digital artefact implemented in Paper I. The evaluation was based on computer science majors using the system for some hours.

We gave the students the same questionnaire before and after the evaluation, and the analysis results indicated that the intervention actually supported the students to understand the abstract ML and ANNs concepts.

3. The focus in Paper III was on the artefact design and implementation. The artefact implemented in Paper III was an open-ended learning analytics (LA) and educational data mining (EDM) tool. The basis of the tool was to use white box MI algorithms that were adjustable and easy to visualize. The tool was aimed to be used by educators, not by only data science specialists. The literature shows that such tools beneﬁt the non-expert users when the transparency of the tool is great.

4. Papers IV and V were both based on the evaluation of the digital artefact developed for Paper III. In Paper IV, users from different backgrounds used the tool to cluster data collected from educational settings with a clustering algorithm introduced in Paper IV. The algorithm was implemented on the system developed for Paper III. The research aimed to show that even non- expert users can discover new knowledge with white box clustering algorithms and that the algorithm’s accuracy increases during the use cases. The research in Paper V evaluated the same system but a different algorithm. In Paper V we let a secondary school teacher without a background in LA or EDM analyze his students with a visualized decision tree classiﬁer that the teacher could modify at any time. In Paper V we aimed to see how long it takes for a non-

(20)

data science specialist to build a classifier that would generalize the dataset and whether such a process would lead to knowledge discovery.

Table 1.2 shows how design research was applied to each Paper I - V and how Papers I - V answered to the research questions RQ1 - RQ3.

Article Methodology RQ

Paper I Literature review, Artefact design 1

Paper II Literature review, Artefact evaluation 1, 2

Paper III Literature review, Artefact design 1

Paper IV Artefact evaluation, Qualitative study, Quantitative study 2, 3 Paper V Artefact evaluation, Qualitative study 2, 3

Table 1.2:Research questions answered in Papers I - V

(21)

2 BACKGROUND

2.1 MACHINE INTELLIGENCE IN EDUCATION

Machine intelligence, MI, is an umbrella term used to describe a computer-based system that exhibits intelligence somewhat comparable to human intelligence. In- telligence exhibited by a system may be general, whereby the system can handle multiple, different tasks that require decision making or learning. Conversely, the system’s ability to perform certain tasks can be very narrow and focused on a small subset of tasks requiring intelligent behavior. The intelligence exhibited by the computer leads to planning [27] or learning [28], for instance, and is used in the domains in which the use of traditional computing methods to derive exact solutions is not feasible. Such domains contain problems in which the space of the possible solution candidates cannot be evaluated by deterministic algorithms or even by using heuristics to support the deterministic algorithms.

The applications of MI include image and speech recognition [29], autonomous vehicles, document classification and soft-computing methods used to obtain super- human performance in several different board games [30]. Various MI algorithms are also used to cope with big data collected from social networks, network traffic or people’s shopping preferences when the usage of the exact algorithms is not feasible due to the complex nature of the problems arising from such settings.

The demand of sophisticated and powerful analysis methods has increased as the amount of the data available increases. The industry copes with the demand by using domain experts and intelligent algorithms. The digitalized world and web- based systems also enable the school sector, from elementary schools to academia, to collect vast amounts of data from the learning contexts. As the use of intelligent algorithms and the volume of the data collected from every day applications increase, younger students have to understand how the data are used and collected and, furthermore, what can be done through MI.

One approach that uses MI in the schooling sector is learning analytics (LA) or educational data mining (EDM). Their aim is to analyze the data collected from educational settings and to provide information for educators from the learning, the teaching and the students. Ideally, the information leads to better practices through interventions. The educators are involved in LA or EDM processes in some cases [31], but usually the practice and the research are focused on data science experts to analyze the collected data. LA is more focused on understanding educational settings through existing MI methods [32], while EDM’s focus is more theoretical.

That is, EDM focuses on developing MI algorithms that would lead to generalizable predictive models across the different educational settings that have common features in the feature space [33].

The LA and EDM approach to analyzing educational data benefits different stakeholders, but LA and EDM come in different shapes. LA and EDM can man- ifest themselves in social network analysis in which the focus is on graph analysis [34] and dataset visualizations, whereas the hyperdimensional feature space can be effectively rendered in a humanly comprehensible form [35]. Social network anal-

(22)

ysis, for instance, can be used to find cliques (complete sub-graphs) to address the collaboration level on the digital learning platforms [36]. Predictive LA (or EDM) [37] is a field of study in which the educational data is analyzed through cluster analysis, classification, regression or anomaly detection algorithms, and the focus is to predict learning and teaching outcomes. We focus in this thesis on predictive LA and EDM and the research supporting our further arguments.

Educational robots are widely used to teach computational thinking concepts such as programming in almost every school level from first grades to university [38], [39], [40] and [41]. Multiple research studies have used educational robots to teach students the complex MI concepts, such as machine learning and artificial neural networks (ANNs), especially at the university levels [42]. Educational robots add meaning and practice to abstract and theoretical MI concepts. Multiple studies have been conducted to show the value of physical computing in MI teaching in which the most-used physical computing platforms are Lego Mindstorms [43] and Arduino [44]. Some of the studies let the students build MI algorithms from scratch and test the predictive models in the real world through physical computing [45], while some studies scaffold the black box of ANNs and let the students test different hyperparameters that influence the model’s performance and, thus, the physical computing artefact [46].

Besides educational robots, multiple soft computing platforms exist to introduce MI concepts to students from first grades to universities [47], [48] and [49]. Such platforms include, for instance, Snap!, Scratch, Wolfram alpha, and several platforms developed for the Python programming language. These platforms usually require some level of understanding of programming concepts to harvest the power of MI, but the programming usually plays a minor part in the MI solution to be developed in those cases. Some professional programming languages also exist that ease the use of MI for different problem domains. These programming languages, such as Python (extended with MI libraries such as Tensorflow [50], Numpy [51], SciPy [52]), Matlab [53] and R [54], can be used to build MI models with a certain level of scaffolding and abstraction. The algorithms implemented for these programming languages are highly customizable, and the usage requires more understanding of abstract MI concepts than the solutions that are only focused on MI and its education.

Snap!, Scratch and Wolfram alpha, for instance, take a traditional approach to MI in which the programming concepts such as conditions and loops are commonly used as a part of MI implementations. More unorthodox methods were used by [55], [56] and [57] to effectively visualize complex MI models in different scenarios.

We, however, used empirical modelling (EM) platforms that derive their philosophy from William James’ radical empiricism [58]. The platforms mix the roles of software developers and end users and tend to be more transparent than traditional software development tools. The focus in these solutions is to deepen the end users’

understanding of MI by involving the actual end user in the development process and by stressing the importance of experience and subjective understanding. We will shortly discuss EM in the next subsection of this thesis and how we adopted EM tools in our research to open complex MI processes for educational use.

(23)

2.2 EMPIRICAL MODELLING

Empirical Modelling (EM) [59] is a computer-based modelling paradigm that aims to create models - referred to as construals - that reflect the subjective experiences of the modeller on the modelled phenomena [60]. EM stresses the value of sense making and the subjective understanding of the referent in order the capture the state-as-experienced in-situ [61], in contrast to stressing the rigorous accuracy of the modelled phenomena - referred to as the referent (Figure 2.1). The construals in EM are built with definitive scripts in which the state-as-experienced is represented to be far more subjective in contrast to the state of a traditional computer-based model. EM philosophy argues that the state of a construal cannot solely be described through formal mathematical notations, like the state of a traditional computer- based model (for instance, like the state of a Turing machine or of Lambda calculus), but can instead be described through the subjective understanding and experience of the modeller, that is, the state-as-experienced. One uses observables, dependencies and agents as the building blocks of definitive scripts [62].

Figure 2.1:EM explained (adapted from [136])

The observables are spreadsheet-like variables that represent the entities in definitive scripts. Such entities might, for instance, be primitive data type of objects such as numbers or strings. We use the term data type to reflect on which entity the observable could be cast on. We stress, however, that the term data type is not used in EM. Observables can also be more complex in nature, such as circles or even 3D objects (Figure [? ]). The dependencies are defined as the real-life relationships between two or more observables. Through the dependencies, the state-change of a single observable bootstraps the state-change of the dependent observables; thus, the values of the dependent observables change as the value of their dependency observable changes. The agents describe the parts of the definitive scripts that can initiate the state-change of the observables. Such parts might include not only func-

(24)

tions and procedures but also single notations within the deﬁnitive scripts (Figure 2.3).

Figure 2.2:An example of a construal in JS-Eden

Figure 2.3:EM syntax with observables, dependencien and agents

(25)

EM has offered multiple environments for the definitive scripts in which the users could have built the construals. The environments have been built as desktop applications, such as tkeden [63] and, lately, web-based environments, such as JS- Eden [64]. All of the environments include the interpreter for the definitive scripts and the editor in which the definitive scripts are written. What is more important to the EM philosophy, the environments have been developed so that creating the construals and using the construals occur in the same platform. The difference between the developer and the end user is vague in EM philosophy; these terms are often used interchangeably in EM.

EM has been connected to machine intelligence (MI) in several research studies.

Whereas [65] used JS-Eden to model artificial neural networks (ANNs) to predict traffic accidents in Thailand, we used JS-Eden to control a Q-learning agent to drive mobile robots such as Arduinos and Lego Mindstorm robots. ANNs are generally viewed as black boxes, and the input - output pair is usually the only variable that can be perceived efficiently by a human user [66]. However, our research in Papers I and II shows that EM tools are capable of opening the black box to some extent through real-time visualizations, real-time state-changes and physical computing platforms that are all implemented in the core EM tools.

We also introduced an analytics platform for education in our research in which the end users can analyze student data that has been imported to the tool from some educational context. The platform follows the EM philosophy in which the construal end user and the developer work in the same workspace: the teacher who analyzes the student data and the data scientist who builds and evaluates the predictive model merge in a single actor who analyzes the student data while understanding the context more deeply and builds the predictive models that are adjustable in real time.

We argue that the state-as-experienced is also present in the analytics tools such that the users both develop and use the predictive MI models. The predictive model generated and used in the open-ended analytics platforms in which the end user is both the user and the developer is bounded to the use of its creator. It may not be generalized to other contexts or users, because the state-as-experienced is used to differentiate the state of a construal from the more rigorous state of a Turing machine [67] and stresses the subjective experience of the construal [68].

2.3 PREDICTIVE EDUCATIONAL DATA MINING AND LEARNING AN- ALYTICS

Educational Data Mining (EDM) is a field of study in which data collected from educational settings is analyzed with machine learning and statistical methods. The aim of EDM is to deepen the understanding of different stakeholders in learning and education. The definition of learning analytics (LA) is quite similar to EDM, and these two terms are often used interchangeably. A major difference is that EDM includes more stakeholders, such as the administration, in the analysis process, while LA narrows the stakeholders down to include mainly the first-hand parties, such as the students [142]. Research in EDM also aims to build predictive models that would be somehow generalizable across the different learning contexts [69]. Building such models includes developing different types of predictive algorithms or modifying the existing solutions by tuning the hyperparameters and combining several existing approaches. The different stakeholders of EDM and LA include the learners, the

(26)

educators and governmental actors, for instance [70].

The methods used to analyze educational data vary in the same way as the methods in data mining in general: cluster analysis, classification, anomaly detection, dimensionality reduction, graph mining and association rule learning are often used to understand the educational process more deeply. Figure 2.4 shows EDM and LA and their relation to other fields, such as computer science and education.

Figure 2.4:Learning analytics and educational data mining [137]

In this thesis, we will focus on predictive LA and EDM. Predictive LA or EDM refers to the processes in which the learning and teaching outcomes will be predicted with different types of MI algorithms. Predictive LA and EDM do not solely focus on classification or regression, which are understood in data mining as predictive algorithms. Instead, predictive LA and EDM also include cluster analysis, anomaly detection or association rule learning and dimensionality reduction, for instance, which can all be used to predict learning outcomes.

Educational data can vary in size and in dimensionality, and it can contain noise and be sparse [71]. The methods used to analyze data from massive, open, online courses (MOOCs) with thousands of students are not necessarily well suited to analyze data from traditional classroom settings that may only contain tens of data [72]. For instance, neural networks usually require a high volume of data in comparison to support vector machines that can cope with small datasets [73]. Similarly, a widely used clustering algorithm, k-means, cannot usually be used with small datasets to cluster data at state-of-the-art accuracy, whereas hierarchical clustering algorithms perform quite well together with small datasets [74].

The platforms used for EDM and LA include tools whose main focus is to analyze educational data [75]. The platforms used in general data mining tasks are also widely used in the field of EDM and LA. RapidMiner [76] (Figure 2.5) and Weka [77] are well-known examples of those. The Orange data mining tool is also often used in LA and EDM (Figure 2.6). The platforms used in EDM and LA often offer an extensive collection of different algorithms to analyze the data and the possibilities of adjusting multiple parameters, viewing visualizations, or, in some

(27)

Figure 2.5:An interface for RapidMiner

cases, also adjusting the predictive models. Hence, the current state-of-the-art tools used in EDM and LA are often viewed as being transparent [78] so that the end user’s engagement can occur during the data mining processes. Multiple studies have shown that the EDM and LA end users beneﬁt from the open-ended approach, which leads to a deeper understanding of the context of the datasets [79], [80] and [81]. However, we question the transparency of the state-of-the-art EDM and LA tools: the end user may have access to the models’ parameters, visualizations and even the model itself, but the model itself is usually far from the white box; there is no justiﬁcation or explanation behind the model’s reasoning. There might be an answer to the questionwhatbut not towhy.

Figure 2.6:Orange data mining tool interface

(28)

An open-ended approach to EDM was implemented in the system named Open Monitoring Environment (OME) for inclusion, understanding and interfering to EDM process [82]. The system was built on top of the Empirical Modelling (EM) platform tk-eden and used Weka as a back-end service. OME collected data from educational robotics classroom activities, and the teachers who were using OME labeled the data collected from the process according to their domain knowledge until the teachers decided to run Weka’s J48 algorithm to build a decision tree classifier from the labelled data. The teachers were able to adjust the model’s nodes and branches to better correspond to their perceptions after the decision tree was rendered to them. The study shows that the teachers’ active role in EDM was beneficial and led to knowledge discovery that would not be possible in a more closed-ended EDM environment [82].

Wolf et al. [83] also used decision tree classifiers as part of their system, in which the platform allowed EDM or LA end users to build decision tree classifiers from scratch by letting them interfere with the algorithm’s building process. The platform offered a simple interface through which the users could make choices of the dataset splits; that is, they could select the pivot attributes and values for the decision tree nodes. Wolf et al. concluded that the process was beneficial for the end users. However, the accuracy differences between their system and the systems that would not allow any inference were unremarkable in the experimented educational datasets.

A major problem in EDM or LA is the generalizability of the predictive models [84]. Until this date, the generated predictive models, while accurately analyzing the educational datasets, cannot be generalized to other educational contexts.

This means multi-level cross validation or model replication across the educational datasets [84]. Much of the research in LA and EDM focus on building predictive models for certain learning scenarios and methods that do not attempt to offer a model that could be effectively used to analyze other learning scenarios. The importance of the context in EDM and LA has been discussed in [85] and [86]. Better generalizability of the EDM and LA models would change the field and lead to better a understanding of learning and teaching. Furthermore, finding a method to generalize models across the education domains would barely require one accurate model for predictions. Using the concept of Occam’s razor [87], one would need only to find the smallest generalizable model for a dataset, and such a model would predict all contexts for the same feature space. It is highly probable that the models cannot generalize across the contexts in which the model size is kept polynomially small in comparison to input size, due to the implications of such a finding and the hardness of finding the smallest models consistent with the given dataset (NP-hard).

We take a different approach in our study of inclusive and white boxed EDM (or LA). We do not try to build models that would generalize across the contexts of the same feature spaces. Instead, we prefer subjective understanding in a certain and limited context. The subjective understanding leads to knowledge discovery but may not be important for different stakeholders unless the subjective understanding leads to better practices. The subjective understanding refers to a teacher’s understanding; hence, we trust the decision making and intuition of single users who could actually change practices through their own increased knowledge base.

We will go through the development of the methods that help us to reach this situation in the following chapters.

(29)

3 NEURAL NETWORKS AND ROBOTICS IN OPEN DIGITAL ENVIRONMENT

Once described the research background, it is necessary to explore the application of ANN and robotics in Open Digital environments. Artificial neural networks (ANNs) are widely used in machine intelligence (MI): ANNs are powerful prediction and regression techniques, but they are opaque by nature. We studied how using physical computing artefacts to shift the complex and formal presentations of ANNs into practical presentations affects the motivation and learning of the students who wish to understand ANNs. This section covers research conducted with computer science students; its aims was to white box ANNs by using educational robotics and EM platform JS-Eden.

3.1 NEURAL NETWORKS

.. .

.. . .. .

I

₁

I

₂

I

3

I

n

H

1

H

n

O

₁

O

n

Input layer

Hidden layer

Output layer

Figure 3.1:MLP architecture .

Artificial neural networks (ANNs) are soft computing-based models. ANN development is inspired by biological neural networks [88]. The architecture and purpose of ANNs vary: some ANNs are used for supervised learning tasks, while others are used for unsupervised or reinforcement learning tasks. Typically, ANNs are built by stacking layers (Figure 3.1) that are connected to each other. The ANN’s first

(30)

layer is usually called the input layer, the successive layers are called hidden layers, and the last layer is called the output layer. The layers consist of nodes called neurons, and the neurons in the different layers are typically connected by real number valued weights. Inside each neuron is an activation function f(w×x)that outputs the activation values based on previous activation values and the weights leading to the neuron. Typical activation functions used in ANNs are hyperbolic tangent [89], Sigmoid [90] and Rectiﬁed linear unit (ReLu [91]) and its variants. The non-linearity of the activation functions (together with at least one hidden layer) enables ANNs to model non-linear functions, that is, the functions in which the decision boundaries cannot be separated with hyperplanes in the standard Euclidean space. Figure 3.2 shows the linear decision boundary for dimensions k = 2. Compare this to Figure 3.3, in which the decision boundary for the same dimensionality is non-linear.

Figure 3.2: Linear decision boundary

Figure 3.3:Non-linear decision boundary

All of the neurons in layer_kcan be fully connected to all of the neurons in layer_k+ 1. The layers arefully connected in that case. This is the case in the ANN called multilayer perceptron (MLP). The layers may also form a loop in which layer_k is connected to both layer_k+1 and layer_k₋_n, where k > n; recurrent neural networks (RNN) are a type of ANNs that contain a loop. It has been shown that RNNs can,

(31)

in fact, simulate any Turing machine due to the loops. Hence, RNNs are Turing complete [143]. Training of ANNs is related to the architecture of an ANN. The typical training algorithm used for MLPs is the backpropagation algorithm [92] that, together with non-linear activation functions, lead ANN to learn non-linear decision boundaries for classification and regression. For recurrent neural networks such as Long Short Term Memory (LSTM), the backpropagation algorithm is extended to backpropagation through a time algorithm [93] in which the loops of the ANN are unfolded based on the time steps of the states in the ANN. Multiple parameters have an influence on the weight changes during the backpropagation. Such parameters include learning rateα, which has an influence on the weight change process and momentum that is used to avoid the backpropagation becoming stuck in the local minimum in the gradient descent-based weight optimization.

Training ANNs is computationally hard: training even a simple 2 hidden node and 1 output node ANN with a linear threshold activation function is NP-Complete [149]. Moreover, finding a smallest ANN consistent with the given dataset or even approximating accurately the smallest size of the consistent ANN is beyond polynomial computation unlessP = NP[150].

3.1.1 Q-learning

Q-learning was introduced in 1989 by Watkins [96] as a type of model-free reinforcement learning method. A Q-learning agent learns the optimal policy in the environment E by taking action a in the state s, s ∈ ^E ??. For each action in the state, the agent is awarded a rewardr, and through the rewards, Q-value will be calculated inQ(s,a) ←^r+γ×^Qmax(s⁰,a⁰), where γ is a parameter for future reward discount factor ands⁰ and a⁰ are the previous state and the previous action, respectively. Higherγvalues imply an opportunistic agent, where the agent values instant high rewards instead of long-term planning and lower γ values imply an agent in which the long-term planning is valued over instantly high rewards.

Figure 3.4:Q-learning [138]

Q-learning can be implemented with a simple look-up table [97] that contains the updated Q-values for each state. However, the size of the look-up table increases exponentially as the number of the possible states increases; hence, the table implementation is only feasible for small-sizedE. Commonly for large E, a universal function approximation such as a neural network is used [98]. Two options are gen-

(32)

erally available for ANN implementations of Q-learning. First, the input layer of the ANN accepts the current state as an input vector, and the output layer of the ANN contains neurons for each possible action. The neuron with the highest value from the state propagation is chosen and, hence, the action corresponding to the neuron. Afterwards, the reward is given to the agent, and the calculated Q-value is backpropagated from the outputting neuron. Second, the input layer accepts the current state and an action, and the output layer contains only a single neuron that represents the Q-value for the state action pair. When a Q-value for each action has been received, then the state action pair with the highest Q-value is chosen, and the updated Q-value with the reward is backpropagated through the ANN for that state action pair.

Q-learning has a wide range of applications, from self-driving cars to agents that master board and computer games at super-human levels [99]. Recently, Q-learning agents have successfully competed in complex board games such as Go and chess against top human competitors. Interestingly, the Q-learning agents master these board games that seem to be intractable to brute force approaches. For instance, the generalized chess game is known to beEXPTIME-complete, that is, no polynomial time algorithm exists to decide the best move for the given state, and generalized Go is known to be as complex with the certain rule sets [151]. Q-learning agents have been shown to be superior to traditional algorithmic approaches that do not use brute force search techniques for the game search trees. For instance, Q-learning was successfully used against a top chess engine, Stockfish, that uses theminimax algorithm [100].

Q-learning is also widely used in the context of robotics. Q-learning is especially popular in the introductory courses for robotics at the higher education level [101].

3.2 TECHNICAL CORE OF THE OPEN DIGITAL PLATFORM

The platform for Open Robotics Learning Environment (ORLE) was developed on top of the latest Empirical Modelling (EM) platform, JS-Eden. Figure 3.5 shows the ORLE architecture. JS-Eden is a web-based tool that is mainly written with the JavaScript programming language. JS-Eden contains an interpreter for definite scripts and a tool set that enables end users to write, use and modify the scripts in an unified digital environment. The syntax for the Eden language, which is the core language to write definitive scripts in JS-Eden, resembles modern high- level programming languages with support for EM concepts such as dependencies, observables and agents. The syntaxes of Eden and modern high-level programming languages such as Python and Javascript have common features, but Eden should not be considered a programming language per se due to the different objectives of Eden and the general use programming languages.

The physical computing platforms we used in ORLE were EV3 robots [102], which are part of Lego’s Mindstorm series and of Arduino microcontrollers [103]

(Figure 3.6). We developed a back-end server that communicates with the EV3 robot and JS-Eden, because JS-Eden is web based. We also modified JS-Eden’s source code so that the back-end server had access to the observable changes through the dependencies defined within the JS-Eden scripts. JS-Eden ran an implementation of a simple recurrent neural network (RNN) (Elman network) [104] with a Q-learning implementation. The parameters such as learning rate αand epsilon greedinesse (the change of a random action) could be bound to observables in JS-Eden; thus, the

(33)

Figure 3.5:ORLE architecture

update of the parameters for the Q-learning agents changed in real time.

The EV3 robot’s sensors and motors were also available as observables in JS- Eden. ORLE supported touch, distance, color and sound sensors and was able to control four motors attached to an EV3 robot. The observables controlling the motors could be updated in real time, but the observables for the sensors were read- only observables; hence, the manual change of the values for the sensor observables had no inﬂuence on the EV3 robot. Furthermore, ORLE could only handle a com- munication between one robot and one JS-Eden instance.

The rewarding of the Q-learning agent occurred within the JS-Eden scripts. The end users could change the value of the reward observable at any stage of the process, and the state-change of the reward observable bootstrapped the backpropagation through the time algorithm written for the recurrent neural network for the RNN model’s current state. The states for the Q-learning agent and the action were also available as the form of the observables. The change of the states for the Q- learning agent automatically propagated forward the states within the RNN model, and the value of the action observable was updated. The action observable was read-only.

3.3 OPENING THE BLACK BOX

We used the developed Open Robotics Learning Environment (ORLE) and the set of EV3 (Figure 3.7) robots from Lego’s Mindstorm series for this study. N=4 computer science major students from the University of Eastern Finland participated in the study, in which the students used ORLE to teach the Q-learning agents to drive the EV3 robots while avoiding the obstacles in a classroom environment. All of the robots used in the study contained one distance sensor and two large motors.

Furthermore, the robots were assembled in advance for the students. All of the students had experience in the Empirical Modelling (EM) platform JS-Eden.

(34)

Figure 3.6:Arduino robot

Figure 3.7: ^{EV3 robot}

We gave a brief introduction to neural networks and reinforcement learning during the first stage of the study. After the introduction, the students completed a questionnaire (can be found in PII) that contained questions about neural networks, reinforcement learning and machine learning in general. The set of the questions can be found in PII. After the questionnaire, the students worked in pairs and tried to use JS-Eden notations to teach the Q-learning agent that controlled the robot to

(35)

avoid the obstacles in the classroom where the study was conducted. The students were able to adjust parameters such asα and e and change the rewarding of the agent. The layers for the RNN used by the Q-learning agent were also visible in JS-Eden; hence, the students were able to view how the backpropgation through the time algorithm changed the weights between the layer nodes. All of the changes made for the JS-Eden notations were updated for the Q-learning agent in real time.

The students were also able to change weights of the RNN if they wished to see how value changes can drop the prediction accuracy.

Learning of the Q-learning agents was concretized to the students with the physical robots, and the real-time changes to the parameters could be observed as the robots’ behavior changed. For instance, ifewas increased, the students were able to see how the robot took more random actions. Increasing the rewards given to the Q-learning agent also had an influence on the driving of the robot, since the size of the given RNNs was small.

We argue that, while the artificial neural networks (ANN) in general are viewed as black boxes, using a physical robot with real-time changing values that have an influence on the ANNs can help the students to understand ANNs. The black box can be opened to some extent, and while the complex operations within the ANN cannot be understood efficiently without a vast knowledge in the subject, the basic understanding can be deepened with concrete examples, and the parameter adjusting of the changes is visible to the users immediately.

Of course, learning an optimal policy for large ANNs can take a large amount of time. Hence, the immediate changes in ANN performance can be viewed when the size of the ANN is relatively small andαrelatively large. For small ANNs, the weight change can also drastically drop the agent’s performance and, in some cases, the weight adjusting can also increase the performance. Such changes committed by the students reflected their understanding about how the weights actually work in the context of ANNs.

3.4 PERCEPTIONS OF END-USERS

The students completed the same questionnaire that was given to them before the experiment after they had used the robots and Open Robotics Learning Environment (ORLE) for 45 minutes. The aim was to compare the pre- and post-questionnaire answers to see if there are indicators that the approach of using the open-ended digital environment with robots helps the students to better understand the basic ANN concepts.

The questions tested the students’ general knowledge of artificial neural networks and Q-learning. Some of the questions were general, such as:

What is an ANN?

However, the questionnaire had more focused questions on specific topics in ANNs and Q-learning:

In the context of reinforcement learning, what are the inputs and outputs of ANN?

The level of the participating students’ answers varied in the pre-questionnaire.

Some students seemed to know the basics of ANNs and even Q-learning, whereas some students lacked even basic knowledge on the subject. Based on their answers,

(36)

the students’ comprehension about the topics increased during the experiment. The increase was largest for those who had no prior or only a little experience with ANNs or Q-learning. We concluded in PII that the open-ended approach to introduce the abstract concepts of ANNs and Q-learning is most beneficial for those who posses only little to no knowledge about and experience of MI concepts. For instance, before the study, a student answered the question,

What different parts artificial neural networks do have with:

Nodes and leaf nodes

The student understood the basics of the architecture of ANNs after the study.

The learned knowledge can be seen in the answer from the same student to the same question after the study:

Input, hidden and output layer which all have neurons

The same student thought before the study that ANNs are used to sort data (What is an ANN?):

An algorithm that sorts data

After the study the same student knew that ANNs can be implemented by using a multidimensional array to store the weights of the connections between the neurons:

Multidimensional array that has values

The study’s remarkable finding was that, in only 45 minutes, the students who lacked basic knowledge about ANNs and Q-learning learned at a basic level how ANNs could be implemented and what can ANNs do. The students also understood the basics of Q-learning, such as rewarding and punishing. The answers were related to the actions of the robot, which was the practical extension of the underlying predictive model.

We learned in the study that the open-ended approach helped the students to transform their theoretical knowledge into more practical knowledge through the physical computing extension. We do not argue that the understanding of MI concepts should be barely practical; it is important to understand the theoretical background behind MI algorithms. However, the practical knowledge is also required to fully understand and utilize MI.

(37)

4 AUGMENTED INTELLIGENCE IN DATA MINING FOR EDUCATION

The interest in learning analytics and educational data mining has increased quickly in educational institutions. However, most of the research and practice focus on data mining experts and researchers performing the data mining and knowledge discovery processes. We study algorithms and methods in this section and how data science novices, such as teachers, could actively participate in data mining and analytics processes. We conducted a series of studies in which white box machine learning algorithms and models were used by data science novices to actively participate in the analysis. The results support our hypothesis that knowledge discovery occurs through understanding the model and its justifications if the novice user’s role in the process is active rather than passively consuming the results.

4.1 WHITE BOX PLATFORMS AND ALGORITHMS

Several open-ended educational data mining (EDM) and learning analytics (LA) platforms exist. Some are designed for only EDM or LA tasks, while other more general purpose data mining tools are also used in the EDM and LA context. The most popular platforms are Weka and RapidMiner. The current state-of-the-art platforms for EDM and LA processes are generallly viewed as white box platforms.

This means that the end users are actively involved in the data mining process and are able to change multiple parameters that influence the results of the process. The end users can view multiple visualizations of even hyperdimensional data and, finally, the end users can adjust the models generated by the algorithms. An example of such model adjusting is presented by Al-Barrak and Al-Razgan [105]

and Jormanainen and Hartfield [106] in which the EDM end users could adjust the model’s rules generated by a decision tree algorithm; hence, the model’s classification evolved while the rules were changed in real time.

RapidMiner collects hundreds of different algorithms that include lightweight analysis algorithms but also highly efficient, state-of-the-art machine learning algorithms such as deep neural networks. The tool lets the users select a suitable method for the data mining task. RapidMiner is used with a block-based notation language that translates to XML (Figure 4.1). The block language used in RapidMiner allows users to visualize, adjust, change and build predictive models that analyze the given datasets in the host computer locally or in RapidMiner’s cloud service. RapidMiner is written in Java and allows users to write macros or import different algorithms to the algorithm suite as long as the implementations use RapidMiner’s XML format or Java.

(38)

Figure 4.1:RapidMiner’s block programming

Weka is an open-source suite for data mining tasks, also written in Java. Weka can be extended with users’ written algorithms and macros through Weka’s package manager system. Weka offers a command line interface for analysis but can be used with a GUI. Weka lacks the block-based customization but has options for efﬁcient data mining tasks, and it is well documented and highly used in EDM and LA research. Weka also allows users to visualize, model and change multiple parameters during the analysis process, and the generated predictive models can be exported from Weka in Java. Weka allows users to visualize the predictive models and the datasets in hyperdimensions (Figures 4.2 and 4.3). The models that can be visualized in Weka include dendograms for hierarchical clustering and binary trees for decision trees and ensemble methods such as random forests.

Figure 4.2:Interface of Weka

(39)

Figure 4.3: Dendogram in Weka

Decision trees are a widely known and used set of machine learning algorithms [107]. Decision tree algorithms generally aim to ﬁnd a feature and a value that split the dataset to maximize the information gain [108]. A binary tree is constructed during the process when the dataset splits occur conditionally and the previous split affects the next split. A terminal (or a leaf) node is created that represents a class in the binary tree when the split divides the remaining dataset exactly into different labels from the given training set (Figure 4.4).

Figure 4.4: A decision tree classiﬁer

(40)

Linear regression, which is used for the classification and regression tasks, is usually viewed as a white box algorithm [109]. Linear regression aims to find pa- rametersαandβfor a linear equation (Figure 4.5)

y=α+βx

which can be generalized for datasets as

yi =α+βxi+e_i

whereeis an error variable that causes noise in the linear relationships of data.

Linear regression divides the dataset by the foundαandethat form a hyperplane in feature space.

Figure 4.5: Linear regression (retrived from

http://www.sthda.com/english/articles/40-regression-analysis/165-linear- regression-essentials-in-r/)

For small dimensionsk≤2, k-nearest neighbour (KNN) [110] is intuitive to ex- plain (Figure 4.6). Hierarchical clustering algorithms can be visualized efficiently through dendograms even for hyperdimensional datasets [111] (Figure 4.3) in contrast to another state-of-the-art clustering algorithm, k-means (Figure 4.7), which cannot be visualized in hyperdimensions.

(41)

Figure 4.6: KNN algorithm (retrieved from https : //www.alldatasheet.net/view_image.jsp?components=KNN)

Figure 4.7: K-means (retriteved fromhttps: //github.com/MihailaDumitru/K− Means_Clustering)

The separation between white box and black box algorithms in machine learning can be vague. The model and decision boundaries for some algorithms and the models, can be visualized efficiently for small dimensionsk≤2 and linear decision boundaries. This is the case for support vector machines (SVM) ?? (Figure 4.8).

SVMs can be viewed as black boxes in the case of hyperdimensional data and non- linear decision boundaries through kernel trick. The models are always black boxes for artificial neural networks, such as multilayer perceptrons, due to the complex matrix operations and model changes during the prediction and learning processes.

(42)

Figure 4.8: Support vector machine [139]

Ensemble methods such as random forests [113] (Figure 4.9), in contrast to decision trees, are usually viewed to be more black boxed than the decision trees because of their lack of interpretability. Kernel random forests, however, are shown to be eas- ier to analyze and to interpret than traditional random forests and are, hence, closer to white box algorithms [114].

Figure 4.9:Random forest [141]

Table 4.1 gives an overview of how the different machine learning algorithms for classification and cluster analysis are related to white box concepts. We distinguish the actual predictive model generated by the algorithm and the algorithm in terms of the white box. For instance, decision tree classifiers can be viewed as white boxes, while an algorithm to build such a classifier can be very complex and requires

Open machine intelligence in education

uef.fi

Dissertations in Forestry and Natural Sciences

TAPANI TOIVONEN

OPEN MACHINE INTELLIGENCE IN

EDUCATION

TAPANI TOIVONEN

Tapani Toivonen

OPEN MACHINE INTELLIGENCE IN EDUCATION

TABLE OF CONTENTS

1 INTRODUCTION

2 BACKGROUND

3 NEURAL NETWORKS AND ROBOTICS IN OPEN DIGITAL ENVIRONMENT

.. .

.. . .. .

I

I

I

I

H

H

O

O

Input layer

Hidden layer

Output layer

4 AUGMENTED INTELLIGENCE IN DATA MINING FOR EDUCATION