Challenges of Open Machine Intelligence - Open machine intelligence in education

The main challenge we encountered in our study of open-ended Machine Intelli-gence (MI) was the steep learning curve in the systems that implement the Aug-mented Intelligence method (AUI). We noticed that when the level of transparency in the decision making for the end user increases, the level of automatization de-creases, and the number of meaningful parameters, models and visualizations in-creases. That is, the more open the system is, the more moving parts the system contains.

Multiple teachers who participated in the study noted that AUI would be prefer-able over traditional educational data mining (EDM) and learning analytics (LA) systems if the systems were easier to learn and use. Different choices in the user interface could ease the learning of the system to some extent. However, the models in AUI are adjustable; hence, the parameters and values in the model architecture cannot be explicitly hidden.

Our research shows no difference between AUI and other types of data mining if the predictive model’s accuracy is measured. However, it should be noted that our research only contained one dataset with few algorithms and only one data mining technique; hence, the conclusions on the accuracy of AUI in comparison to other data mining methods require more research. However, because the learning curve is steeper in AUI than in other data mining methods, and it seems that the accuracy of the different methods in comparison to AUI is quite equal, other methods should be favorable for the small data mining instances.

However, our study indicates that over time with the same datasets, the AUI models tend to generalize better by avoiding overfitting. The problem of overfitting is present in some forms of decision tree classifiers and is usually avoided by some heuristics, such as pruning the model output of the decision tree classifier algorithm.

The overfitting in AUI can be avoided by adjusting the attributes and pivot values within the decision tree model nodes and changing the labels in the terminal nodes to better correspond to the generalization generated by the end user’s perception.

However, the generalization is subjective to the end user and, thus, the model that avoids overfitting for some end users might be generalizable to other end users.

The Empirical Modelling (EM) platform used to teach the basics of neural net-works and reinforcement learning can also be difficult for a beginner to use. All of the participants in our study had prior experience in EM and the JS-Eden platform.

It can be asked whether the level of understanding and the motivation that was shown by the test subjects can be achieved with students who also have to learn the basics of EM and JS-Eden. EM differs from traditional computer programming, because the construals in EM aim to reflect the subjective understanding of the end user from the referent, whereas the traditional computer program can be described as a state of a Turing machine. In this sense, the construal that is extended with a physical computing artefact and a machine learning agent is a reflection of the end user’s understanding of the machine’s learning. The understanding is subjec-tive and might not reflect the actual rigorous mathematical model that leads to the learning of the machine: the students’ misconceptions are, instead, present in the construal no matter whether the intelligent agent that extends the construal learns or not.

We conclude, as the result of our studies, that the major drawback of open-ended machine intelligence is that the approach is not as easy to learn and the implemen-tations of open-ended machine intelligence are not as easy to use as their more black

boxed counterparts. We further state that while involving the EM principles in open-ended machine intelligence benefits the end user by increasing the transparency of the MI process, the predictive models, whether they are Q-learning agents, cluster analysis models or classification models, are always bound to the current time and space and reflect only the current state-as-experienced between the referent and the end user. Hence, open-ended machine intelligence in education does not solve the problem of generalizable models in EDM or LA: it does the exact opposite. It stresses the value of the experience and the subjective understanding in-situ in which end users have their own interpretations from the educational context.

6 SUMMARY OF CONTRIBUTIONS

We used the design research paradigm to conduct the research for this thesis. The focus in design research is to develop a digital artefact, evaluate the digital artefact and, based on the evaluation, develop a theory around the digital artefact and its influence on the context. We developed two different digital artefacts for this thesis.

First, we designed and implemented an open-ended robotics platform to support students learning the abstract concepts of artificial neural networks (ANNs) and reinforcement learning (RL). Second, we designed and implemented an educational data mining (EDM) system that facilitated the augmented intelligence method (AUI) to analyze data collected from learning contexts.

We developed the understanding that Open Machine Intelligence in Education (OMIE) motivates and supports students in learning abstract concepts of machine intelligence (MI), such as ANNs and RL, as the contributions from the first round of design research cycles (the digital artefact for RL and ANNs). We discovered that the students who participated in our study managed to deepen their understanding about ANNs and RL because of the open-ended approach used in the learning environment.

We developed and conducted evaluations for a new clustering algorithm called Neural N-Tree (NNT) as the contributions from the second round of design research cycles (the digi- tal artefact for EDM). NNT, according to our evaluations, can be used in EDM to cluster educational data with an accuracy level comparable to the state-of-the-art clustering algorithms.

We also developed a theory about AUI and discovered that, through AUI, even the data science novices can mine educational data and discover new knowledge from the learning context as long as the predictive models are adjustable and in-terpretable by a human. Such predictive models include NNT and decision tree classifiers, for instance.

The following list summarizes the scientific contributions presented for thesis.

• Paper I:Empirical Modelling (EM) can be used to implement complex and abstract concepts of ANNs: EM can model the structure and changes of the structure of ANN through its built-in mechanisms such as observables and dependencies. Furthermore, EM implementations of ANNs allow real-time adjusting of parameters and even the internal structures of ANNs

• Paper II:ANN and Q-learning solution where the theory was turned into practical exercise through EM and physical computing supports students with no background in ANNs or RL to comprehend the basics of machine learning

• Paper III:Neural N-Tree algorithmand introduction of an open-ended analysis tool, which implemented the Neural N-Tree.

• Paper IV:Neural N-Tree algorithm can be used in EDM and has similar performance in comparison to state-of-art clustering algorithms. The research introduces and eval-uates the feasibility of AUI, in which the EDM end-user interacts with a white box ML

algorithm for the knowledge discovery. The evaluation of AUI supports the hypothesis that AUI indeed leads to knowledge discovery.

• Paper V: AUI can be used by data science novices to build classifiers, where the building process itself leads to knowledge discoveries from the dataset and its context.

Also, AUI leads to more robust classifiers even when the adjustments and changes have been performed by data science novices.

7 DISCUSSION

We have presented several methods in this thesis to white box complex machine intelligence (MI) models and algorithms. First, we used educational mobile robots to introduce the concepts of artificial neural networks (ANNs) and reinforcement learning (RL) to computer science major students. We implemented a system in which the ANN models and RL agents could be visualized, while the physical arte-facts extended abstract models into the real world. The studies conducted showed that the students did benefit from the white box approach and actually were able to deepen their understanding of MI methods. However, while Benítez et al. [132] ar-gued that ANN can be actually presented as white boxes through Bayesian methods [133], the complex operations between the layers in ANNs cannot be fully explained even with the devices and visualizations. We argue that this is also the case with several other MI methods when the feature space is hyperdimensional and the de-cision boundaries between the labels, i.e., classes or clusters, are not linear.

Educational data mining (EDM) and learning analytics (LA) research is currently focused on finding generalizable models that would ultimately lead to a deeper un-derstanding of learning. Practical interventions occur, but the EDM and LA com-munity is still largely formed by researchers and data scientists, and educators are excluded from the first steps of the process. We wish to shorten the gap between the MI experts and educators and also allow the data science novices to analyze educa-tional data with the methods introduced in this thesis. The systems developed for this purpose require transparency and interpretability. Even computer science ma-jors failed to understand clustering in our research, so we raise the question: How feasible is it to expect that secondary school teachers can master current state-of-the-art EDM and LA systems that allow them to create complex macros and adjust hyperparameters in deep learning methods?

The teachers involved in our studies who used the transparent MI systems to analyze their students struggled with simple and small decision trees. The teachers noted that even with very interpretable models, they sometimes failed to compre-hend how and why the system predicted the class labels or clustered the students in the same group. The teachers valued the knowledge discovery that emerged through the use of system and stressed that the system was on the right track, but as XAI asks: to what extent can we white box an algorithm and still use it?

Many studies have focused on introducing MI concepts to students, but most of them focus on the university level. Some of the systems use mobile robots to famil-iarize students with ANNs and related algorithms. The approach to MI can be re-inforcement learning, for instance. Ken Kahn [134] implemented a system in which, with the visual programming language Snap!, students from almost all schooling levels would be able to use and implement systems that recognize speech and de-tect different postures. However, the system is still black boxed in such a way that the underlying models are not and cannot be explained to the students. Of course, the purpose of such systems is to showcase what can be done through sophisticated algorithms and not to understand the logic behind complex calculations.

In the future, we will study how a wider selection of machine intelligence (MI)

algorithms could be used to introduce the MI concepts. When selecting simple featuresleq2, almost any algorithm that produces linear decision boundaries can be white boxed with visualizations. We will study k-nearest neighbour (KNN) with a feature space size ofk = 2 for the first attempt. The algorithm is quite easy to explain, implement and test, although knn naturally produces non-linear decision boundaries. We already have a prototype of the system that currently recognizes images with two selected features by using knn.

Our future studies include developing the learning analytics (LA) or educational data mining (EDM) system further and testing its interpretability with a different set of white box algorithms while preserving the prediction or clustering accuracy.

However, there is a saying in the machine learning community that There is no free lunch,

meaning that no algorithm exists that would be superior to all other algorithms for all datasets and settings. Hence, the scope of different algorithms used in such systems should be quite large while preserving its ease of use.

7.1 GENERAL DISCUSSION

The evaluation of the predictive models can be conducted with several measure-ment methods. Some measuremeasure-ment methods calculate values for the predictions; the highest or the lowest value usually implies the superiority. Such validation methods include silhouette index, Dunn index and Davies-Bouldin Index for cluster analy-sis and logarithmic loss, confusion matrix and mean squared error for classification tasks.

External evaluation for cluster analysis can be also conducted. In External eval-uation, the class labels are known before the cluster analysis, and the class labels given by the clustering algorithm are compared to actual class labels. If the class labels given by the algorithm for the dataset instances are similar to the class labels known before the cluster analysis, then the algorithm usually managed the cluster the dataset well. However, cluster analysis is a part of unsupervised learning and ultimately aims for knowledge discovery, so labelling the data instances with similar labels as an expert evaluator could lead to a low level of knowledge discovery.

The purpose of unsupervised machine learning is - without the interference of a human - to label the dataset, reduce dimensionality or visualize the dataset in hyper-dimensions. The focus in unsupervised learning, in contrast to supervised learning, is not in the predictions but to interpret data differently. Such interpretations can be in different groups of data instances or in a decreased number of dimensions in the visualizations, for instance. Such methods aim for knowledge discovery, and if the result of an algorithm is intuitive instead of innovative, then the algorithm itself can be very accurate in terms of objective evaluation, but the level of knowledge discovery could be low.

A similar criticism can be directed at any evaluation metrics for cluster analy-sis. Whether or not a clustering algorithm clusters the data poorly based on some metrics, the meaningfulness of knowledge discovery is ultimately based on the end user’s subjective understanding. An optimal clustering does not necessarily lead to knowledge discovery. Instead, if the results of a clustering can be expected, then learning new knowledge might be difficult.

The educational data mining (EDM) end user can adjust, build and change pre-dictive models in an iterative process using the Augmented Intelligence Method (AUI). The predictive models can be cluster analysis models or classification mod-els. The aim is not for accurate predictions but for the knowledge discovery from the process of adjusting, changing and building. The models generated in such a way are meaningful for the end user, but they might fail to generalize different datasets from the same context, or they might even be non-meaningful for a different end user and the same dataset context.

Again, should AUI be evaluated through the objective accuracy of the predictive models or through how meaningful the knowledge discovery was from the building of the models? We argue that even the meaningful models can be very inaccurate, and knowledge discovery does not necessarily correlate with accurate predictions.

In document Open machine intelligence in education (sivua 60-67)