New frameworks - FRAMEWORK INTRODUCTION - Automated machine learning: Evaluating AutoML framewo

5. FRAMEWORK INTRODUCTION

5.3 New frameworks

The frameworks that were brought in as new ones to this research are presented here in the same manner as the already added frameworks were.

5.3.1 Autokeras

Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow. It is designed to enable fast experimentation with deep neural net-works, it focuses on being user-friendly, modular, and extensible. Keras contains numer-ous implementations of commonly used neural-network building blocks such as layers, objectives, activation functions, optimizers, and a host of tools to make working with im-age and text data easier to simplify the coding necessary for writing deep neural network code [53]. The usage of Tensorflow as its backend allows to tap into the computers GPU for more effective processing.

Autokeras is the AutoML system based on Keras and it is developed by Texas A&M University. Like AutoML in general, Autokeras also intends to bring machine learning closer to the user and be easier to use in tasks. Autokeras brings powerful Tensorflow backend also to the user in a very simple way. Keras is one of the most well known and most used machine learning systems but the AutoKeras has not found such success at least for now. [53]

5.3.2 MLBox

MLBox is a powerful Automated Machine Learning python library. MLBox claims to have fast reading and distributed data preprocessing/cleaning/formatting, highly robust feature selection and leak detection, accurate hyper-parameter optimization in high-dimensional space, state-of-the art predictive models for classification and regression and prediction with models interpretation [54]. To standout MLBox focuses on drift identification, entity embedding and hyperparameter optimization. MLBox does not support unsupervised learning but luckily we will be testing classification which is highly supported in MLBox [54].

5.3.3 Lightautoml

LightAutoML is a project from Sherbank AI Lab AutoML group and it is a framework for automatic classification and model creation which makes it perfect for our research setup. At the moment LightAutoML enables the creation of a pipeline that does automatic hyperparameter tuning and processing, feature selection and some easy-to-use graph-ical interfaces [55]. The LightAutoML is also a framework that has been added to the benchmark in the time since their research so it will be interesting to document that frameworks performance.

5.3.4 Autogluon

AutoGluon is another AutoML tool for Python that automates machine learning tasks enabling the user to easily achieve strong predictive performance in their applications. It describes itself as being very easy to use and their example actually has just five lines of code including the data input. Autogluon leverages automatic hyperparameter tuning, model selection, architecture search, and data processing. [https://auto.gluon.ai/sta-ble/index.html] Autogluon is originally created by Amazon for Amazon Web Services but it has since been open sourced. [56]

5.3.5 Oboe

Oboe and TensorOboe, are automated model selection systems that uses collaborative filtering to find good models for supervised learning tasks within a user-specified time limit. Further hyperparameter tuning can be performed afterwards [57]. We will be using the regular Oboe version because Oboe does not support pip package installation and the TensorOboe package is slightly inconvenient to install.

The following is a quotation of how Oboe works from the makers of Oboe from their paper OBOE: Collaborative Filtering for AutoML Model Selection: “Oboe is a collaborative fil-tering method for time-constrained model selection and hyperparameter tuning. Oboe forms a matrix of the cross-validated errors of a large number of supervised learning models (algorithms together with hyperparameters) on a large number of datasets, and fits a low rank model to learn the low-dimensional feature vectors for the models and datasets that best predict the cross-validated errors. To find promising models for a new dataset, Oboe runs a set of fast but informative algorithms on the new dataset and uses

their cross-validated errors to infer the feature vector for the new dataset. Oboe can find good models under constraints on the number of models fit or the total time budget”.

Oboe basically works so that it searches for promising estimators. This brings up its biggest weakness which is that it needs pre-processed dataset to work and all features need to be standardized to have zero mean and unit variance. Oboe is still largely under development particularly on its documentation side. Oboe is also one of the frameworks already added to the benchmark but was not available in the original paper of the bench-mark.

5.3.6 Mlplan

ML-Plan is a Java based AutoML framework that uses WEKA and Scikit-Learn to its advantage to provide automated machine learning for Java users through Eclipse IDE.

It has been integrated into a larger AILibs project. It has been also added to the bench-mark later so it should be operatable in the Python world. This framework will largely be testing the same things that have already been tested with Auto-WEKA and Auto Scikit-Learn so including it might prove to be redundant if the results do not vary in some sig-nificant way. [58]

5.3.7 GAMA

GAMA or General Automated Machine learning Assistant is another tool for AutoML that has already been added to the benchmark repository. GAMA’s technique is to find auto-matically a good machine learning pipeline. GAMA defines the pipeline as data prepro-cessing steps, various machine learning algorithms, and their possible hyperparameters configurations. GAMA also provides a command line tool where you can load your da-taset directly but it supports only some of the functionality of the full Python package. On top of that it has a dashboard that can be used but it is also still in further development.

It is obvious GAMA has taken a lot from the other AutoML frameworks and we have high hopes for it as it has been developed by one of the authors of the AutoML benchmak Pieter Gijsbers. [59]

5.3.8 Ludwig

Ludwig is a ”cofree” deep-learning tool box that offers also AutoML usage that is de-veloped by Uber. Ludwig has been built on top of Tensorflow and its goal is to make it

super easy for users to train and test deep learning models. It has been built entirely using Python and thus it also provides an API for more code-oriented users like us to get some research done. [60]

Ludwig has drawn inspiration from other machine learning and automated machine learning models such as WEKA and scikit-learn as well and admit it as they did not want to “re-invent the wheel”. Ludwig provides three main functionalities: training models and using them to predict and evaluate them. It is based on datatype abstraction, so that the same data preprocessing and postprocessing will be performed on different datasets that share datatypes and the same encoding and decoding models developed can be re-used across several tasks. Of course, Ludwig also suffers from the same issue as MLPlan because it is built on top of other systems. Does it provide additional knowledge?

That will be shown during research. [60]

In document Automated machine learning: Evaluating AutoML frameworks (sivua 34-38)