Release 3.0.0 Netherlands eScience Center

mcfly DocumentationRelease 3.0.0

Netherlands eScience Center

Jun 18, 2020

Contents

1 Installation 31.1 Installing on Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 User Manual 52.1 The function findBestArchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Visualize the training process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 FAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Technical documentation 93.1 Hyperparameter search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2 Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.3 Other choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.4 Comparison with non-deep models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Data preprocessing 134.1 Eligible data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.2 Data format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.3 Data preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5 Indices and tables 15

Index 17

i

ii

mcfly Documentation, Release 3.0.0

The goal of mcfly is to ease using deep learning technology for time series classification. The advantages of deeplearning algorithms is that it can handle raw data directly with no need to compute signal features, it does not require aexpert domain knowledge about the data, and it has been shown to be competitive with conventional machine learningtechniques. As an example, you can apply mcfly on, accelerometer data for activity classification, as shown in themcfly tutorial.

If you use mcfly in your reserach, please cite the following software paper:

D. van Kuppevelt, C. Meijer, F. Huber, A. van der Ploeg, S. Georgievska, V.T. van Hees. Mcfly: Automated deeplearning on time series. SoftwareX, Volume 12, 2020. doi: 10.1016/j.softx.2020.100548

Contents:

Contents 1

https://github.com/NLeSC/mcfly-tutorial

https://doi.org/10.1016/j.softx.2020.100548


2 Contents

CHAPTER 1

Installation

Prerequisites:

• Python 2.7, 3.5 or 3.6

• pip

Installing all dependencies in sparate conda environment:

conda env create -f environment.yml

source activate mcfly

To install the package, run in the project directory:

pip install .

1.1 Installing on Windows

When installing on Windows, there are a few things to take into consideration. The preferred (in other words: easiest)way to install Keras and mcfly is as follows:

• Use Anaconda

• Use Python 3.x, because tensorflow is not available on Windows for Python 2.7

• Install numpy and scipy through the conda package manager (and not with pip)

• To install mcfly, run pip install mcfly in the cmd prompt.

• Loading and saving models can give problems on Windows, see https://github.com/NLeSC/mcfly-tutorial/issues/17

3

https://www.continuum.io/downloads

https://github.com/NLeSC/mcfly-tutorial/issues/17

https://github.com/NLeSC/mcfly-tutorial/issues/17


1.2 Visualization

We build a tool to visualize the configuration and performance of the models. The tool can be found on http://nlesc.github.io/mcfly/. To run the model visualization on your own computer, cd to the html directory and start up a pythonweb server:

python -m http.server 8888 &

Navigate to http://localhost:8888/ in your browser to open the visualization. For a more elaborate description of thevisualization see user manual.

4 Chapter 1. Installation

http://nlesc.github.io/mcfly/

http://nlesc.github.io/mcfly/

https://github.com/NLeSC/mcfly/wiki/User-manual

CHAPTER 2

User Manual

On this page, we describe what you should know when you use mcfly. This manual should be understandable withouttoo much knowledge of deep learning, although it expects familiarity with the concepts of dense hidden layers, con-volutional layers and recurrent layers. However, if mcfly doesn’t give you a satisfactory model, a deeper knowledgeof deep learning really helps in debugging the models.

We provide a quick description for the layers used in mcfly.

• dense layer also know as fully connected layer, is a layer of nodes that all have connections to all outputs of theprevious layer.

• convolutional layer convolves the output of the previous layer with one or more sets of weights and outputsone or more feature maps.

• LSTM layer is a recurrent layer with some special features to help store information over multiple time stepsin time series.

Some recommended reading to make you familiar with deep learning: http://scarlet.stanford.edu/teach/index.php/An_Introduction_to_Convolutional_Neural_Networks

Or follow a complete course on deep learning: http://cs231n.stanford.edu/

2.1 The function findBestArchitecture

The function find_best_architecture() generates a variety of architectures and hyperparameters, and returnsthe best performing model on a subset of the data. The following two types of architectures are possible (for moreinformation, see the Technical documentation):

CNN: [Conv - Relu]*N - Dense - Relu - Dense - Relu - Softmax

DeepConvLSTM: [Conv - Relu]*N - [LSTM]*M - Dropout - TimeDistributedDense -Softmax - TakeLast

The hyperparameters to be optimized are the following:

• learning rate

5

http://scarlet.stanford.edu/teach/index.php/An_Introduction_to_Convolutional_Neural_Networks

http://scarlet.stanford.edu/teach/index.php/An_Introduction_to_Convolutional_Neural_Networks

http://cs231n.stanford.edu/


• regularization rate

• model_type: CNN or DeepConvLSTM

• if modeltype=CNN:

– number of Conv layers

– for each Conv layer: number of filters

– number of hidden nodes for the hidden Dense layer

• if modeltype=DeepConvLSTM:

– number of Conv layers

– for each Conv layer: number of filters

– number of LSTM layers

– for each LSTM layer: number of hidden nodes

We designed mcfly to have sensible default values and ranges for each setting. However, you have the possibility toinfluence the behavior of the function with the arguments that you give to it to try other values. These are the options(see also the documentation of generate_models()):

• number_of_models: the number of models that should be generated and tested

• nr_epochs: The models are tested after only a small number of epochs, to limit the time. Setting this numberhigher will give a better estimate of the performance of the model, but it will take longer

• model_type Specifies which type of model (‘CNN’ or ‘DeepConvLSTM’) to generate. With default value Noneit will generate both CNN and DeepConvLSTM models.

• Ranges for all of the hyperparameters: The hyperparameters (as described above) are sampled from a uniform or log-uniform distribution. The boundaries of these distributions can be set, and are defined by the following arguments:

– low_lr and high_lr: learning rate will be sampled from a log-uniform distribution between 10𝑙𝑜𝑤_𝑙𝑟

and 10ℎ𝑖𝑔ℎ_𝑙𝑟

– low_reg and high_reg: regularization rate will be sampled from a log-uniform distribution between10𝑙𝑜𝑤_𝑟𝑒𝑔 and 10ℎ𝑖𝑔ℎ_𝑟𝑒𝑔

– cnn_min_layers and cnn_max_layers: range for number of Conv layers in CNN model

– cnn_min_filters and cnn_max_filters: range for number of filters per Conv layer in CNN model

– cnn_min_fc_nodes and cnn_max_fc_nodes: range for number of hidden nodes per Dense layer inCNN model

– deepconvlstm_min_conv_layers and deepconvlstm_max_conv_layers: range for number of Convlayers in DeepConvLSTM model

– deepconvlstm_min_conv_filters and deepconvlstm_max_conv_filters: range for number of filtersper Conv layer in DeepConvLSTM model

– deepconvlstm_min_lstm_layers and deepconvlstm_max_lstm_layers: range for number of Convlayers in DeepConvLSTM model

– deepconvlstm_min_lstm_dims and deepconvlstm_max_lstm_dims: range for number of hiddennodes per LSTM layer in DeepConvLSTM model

6 Chapter 2. User Manual


2.2 Visualize the training process

To gain more insight in the training process of the models and the influence of the hyperparameters, you can explorethe visualization.

1. Save the model results, by defining outputpath in find_best_architecture.

2. Start an python webserver (see Installation) and navigate to the visualization page in your browser.

3. Open the json file generated in step 1.

In this visualization, the accuracy on the train and validation sets are plotted for all models. You can filter the graphsby selecting specific models, or filter on hyperparameter values.

2.3 FAQ

2.3.1 None of the models that are tested in findBestArchitecture perform satisfac-tory

Note that find_best_architecture() doesn’t give you a fully trained model yet: it still needs to be trained onthe complete dataset with sufficient iterations. However, if none of the models in find_best_architecture()have a better accuracy than a random model, it might be worth trying one of the following things:

• Train more models: the number of models tested needs to be sufficient to cover a large enough part of thehyperparameter space

• More epochs: it could be that the model needs more epochs to learn (for example when the learning rate issmall). Sometimes this is visible from the learning curve plot

• Larger subset size: it could be that the subset of the train data is too small to contain enough information forlearning

• Extend hyperparameter range

2.2. Visualize the training process 7


8 Chapter 2. User Manual

CHAPTER 3

Technical documentation

This page describes the technical implementation of mcfly and the choices that have been made.

3.1 Hyperparameter search

Mcfly performs a random search over the hyper parameter space (see the section in the user manual about whichhyperparameters are tuned). We chose to implement random search, because it’s simple and fairly effective. Weconsidered some alternatives:

• Bayesian optimization with Gaussian processes, such as spearmint: is not usable for a mix of discrete (e.g.number of layers) and continuous hyperparameters

• Tree of Parzen Estimator (TPE, implemented in hyperopt) is a Bayesian optimization method that can be usedfor discrete and conditional hyperparameters. Unfortunately, hyperopt is not actively maintained and the latestrelease is not python 3 compatible. (NB: the package hyperas provides a wrapper around hyperopt, specificallyfor Keras)

• SMAC is a hyperparameter optimization method that uses Random Forests to sample the new distribution. Wedon’t use SMAC because the python package depends on a Java program (for which we can’t find the sourcecode).

If you are interested in the different optimization methods, we recommend the following readings:

• Bergstra, James S., et al. “Algorithms for hyper-parameter optimization.” Advances in Neural InformationProcessing Systems. 2011. (link)

• Hutter, Frank, Holger H. Hoos, and Kevin Leyton-Brown. “Sequential model-based optimization for generalalgorithm configuration.” International Conference on Learning and Intelligent Optimization. Springer BerlinHeidelberg, 2011. (link)

• Eggensperger, Katharina, et al. “Towards an empirical foundation for assessing bayesian optimization of hyper-parameters.” NIPS workshop on Bayesian Optimization in Theory and Practice. 2013. (link)

• Blogpost by Ben Recht

• Blogpost by Alice Zheng

9

https://github.com/HIPS/Spearmint

http://hyperopt.github.io/hyperopt/

https://github.com/maxpumperla/hyperas

http://www.cs.ubc.ca/labs/beta/Projects/SMAC/

https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf

http://www.cs.ubc.ca/labs/beta/Projects/SMAC/papers/11-LION5-SMAC.pdf

http://aad.informatik.uni-freiburg.de/papers/13-BayesOpt_EmpiricalFoundation.pdf

http://www.argmin.net/2016/06/20/hypertuning/

http://blog.turi.com/how-to-evaluate-machine-learning-models-part-4-hyperparameter-tuning


3.2 Architectures

There are two types of architectures that are available in mcfly: CNN and DeepConvLSTM. The first layer in botharchitectures is a Batchnorm layer (not shown below), so that the user doesn’t have to normalize the data during datapreparation.

3.2.1 CNN

The model type CNN is a ‘regular’ Convolutional Neural Network, with N convolutional layers with Relu activationand one hidden dense layer. So the architecture looks like:

Batchnorm - [Conv - Batchnorm - Relu]*N - Dense - Relu - Dense - Batchnorm -Softmax

The number of Conv layers, as well as the number of filters in each Conv layer and the number of neurons in thehidden Dense layer are hyperparameters of this model. We decided not to add Pool layers because reducing the spatialsize of the sequence is usually not necessary if you have enough convolutional layers.

3.2.2 DeepConvLSTM

The architecture of the model type DeepConvLSTM is based on the paper: Ordóñez et al. (2016). The architecturelooks like this:

Batchnorm - [Conv - Relu]*N - [LSTM]*M - Dropout - TimeDistributedDense -Softmax - TakeLast

The Softmax layer outputs a sequence of predictions, so we need a final TakeLast layer (not part of Keras) to pick thelast element from the sequence as a final prediction. In contrast to the CNN model, the convolutional layers in theDeepConvLSTM model are applied per channel, and only connected in the first LSTM layer. The hyperparametersare the number of Conv layers, the number of LSTM layers, the number of filters for each Conv layer and the hiddenlayer dimension for each LSTM layer. Note that in the paper of Ordóñez et al, the specific architecture has 4 Convlayers and 2 LSTM layers.

3.3 Other choices

We have made the following choices for all models:

• We use LeCun Uniform weight initialization (LeCun 1998)

• We use L2 regularization on all convolutional and dense layers

• We use categorical cross-entropy loss

• We output accuracy and take this as a measure to choose the best performing model

3.4 Comparison with non-deep models

To check the value of the data, a 1-Nearest Neighbors model is applied as a benchmark for the deep learning model.We chose 1-NN because it’s a very simple, hyperparameter-free model that often works quite well on time series data.For large train sets, 1-NN can be quite slow: the test-time performance scales linear with the size of the training set.However, we perform the check only on a small subset of the training data. The related Dynamic Time Warping

10 Chapter 3. Technical documentation

http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf


(DTW) algorithm has a better track record for classifying time series, but we decided not to use it because it’s too slow(it scales quadratically with the length of the time series).

3.4. Comparison with non-deep models 11


12 Chapter 3. Technical documentation

CHAPTER 4

Data preprocessing

The input for the mcfly functions is data that is already preprocessed to be handled for deep learning module Keras.On this page, we describe what data format is expected and what to think about when preprocessing.

4.1 Eligible data sets

Mcfly is a tool for classification of single or multichannel timeseries data. One (real valued) multi-channel time seriesis associated with one class label. All sequences in the data set should be of equal length.

4.2 Data format

The data should be split in train, validation and test set. For each of the splits, the input X and the output y are bothnumpy arrays.

The input data X should be of shape (num_samples, num_timesteps, num_channels). The output data y is of shape(num_samples, num_classes), as a binary array for each sample.

We recommend storing the numpy arrays as binary files with the numpy function np.save.

4.3 Data preprocessing

Here are some tips for preprocessing the data:

• For longer, multi-label sequences, we recommend creating subsequences with a sliding window. The length andstep of the window can be based on domain knowledge.

• One label should be associated with a complete time series. In case of multiple labels, often the last label istaken as the label for the complete sequence. Another possibility is to take the majority label.

13


• In splitting the data into training, validation and test sets, it might be necessary to make sure that sample subject(such as test persons) for which multiple sequences are available, are not present in both train and validation/testset. The same holds for subsequences that originates from the same original sequence (in case of sliding win-dows).

• The Keras function keras.utils.np_utils.to_categorical can be used to transform an array ofclass labels to binary class labels.

• Data doesn’t need to be normalized. Every model mcfly produces starts by normalizing data through a BatchNormalization layer. This means that training data is used to learn mean and standard deviation of each channeland timestep.

14 Chapter 4. Data preprocessing

CHAPTER 5

Indices and tables

• genindex

• modindex

• search

15


16 Chapter 5. Indices and tables

Index

Mmcfly (module), 14

17

Release 3.0.0 Netherlands eScience Center

Documents