Top Banner

Click here to load reader

Unsupervised Feature Extraction for Reinforcement Learning · PDF file 2017-05-12 · Unsupervised Feature Extraction for Reinforcement Learning Thesis submitted in partial ful llment

Jun 28, 2020

ReportDownload

Documents

others

  • Faculteit Wetenschappen en Bio-ingenieurswetenschappen

    Vakgroep Computerwetenschappen

    Unsupervised Feature Extraction for

    Reinforcement Learning

    Proefschrift ingediend met het oog op het behalen van de graad van

    Master of Science in de Ingenieurswetenschappen: Computerwetenschappen

    Yoni Pervolarakis

    Promotor: Prof. Dr. Peter Vrancx

    Prof. Dr. Ann Nowé

    Juni 2016

  • Faculty of Science and Bio-Engineering Sciences

    Department of Computer Science

    Unsupervised Feature Extraction for

    Reinforcement Learning

    Thesis submitted in partial fulfillment of the requirements for the degree of

    Master of Science in de Ingenieurswetenschappen: Computerwetenschappen

    Yoni Pervolarakis

    Promotor: Prof. Dr. Peter Vrancx

    Prof. Dr. Ann Nowé

    June 2016

  • Abstract

    When using high dimensional features chances are that most of the features

    are not important to a specific problem. To eliminate those features and

    potentially finding better features different possibilities exist. For example,

    feature extraction that will transform the original input features to a new

    smaller dimensional feature set or even a feature selection method where

    only features are taken that are more important than other features. This

    can be done in a supervised or unsupervised manner. In this thesis, we will

    investigate if we can use autoencoders as a means of unsupervised feature

    extraction method on data that is not necessary interpretable. These new

    features will then be tested in a Reinforcement Learning environment. This

    data will be represented as RAM states and are blackbox since we cannot

    understand them. The autoencoders will receive a high dimensional feature

    set and will transform it into a lower dimension, these new features will be

    given to an agent who will make use of those features and tries to learn from

    them. The results will be compared to a manual feature selection method

    and no feature selection method.

    i

  • Acknowledgements

    First and foremost I would like to thank Prof. Dr. Peter Vrancx for helping

    me find a subject I am passionate about, taking the time for weekly updates

    and for all his suggestions and numerous conversions on how this subject

    could be tackled.

    Secondly, I would also like to thank Prof. Dr. Ann Nowé for piquing my

    interest in the master Artificial Intelligence when taking her course in my

    first year on the Vrije Universiteit Brussel.

    And finally I would also like to thank my mother for supporting me to pursue

    my studies at university level and my girlfriend for her endless support.

    ii

  • Contents

    1 Introduction 1

    1.1 Research Question . . . . . . . . . . . . . . . . . . . . . . . . 4

    2 Machine Learning 6

    2.1 Supervised learning . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.1.1 Classification . . . . . . . . . . . . . . . . . . . . . . . 7

    2.1.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.2 Unsupervised learning . . . . . . . . . . . . . . . . . . . . . . 11

    2.3 Underfitting and overfitting . . . . . . . . . . . . . . . . . . . 13

    2.4 Bias - Variance . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    2.5 Ensembles methods . . . . . . . . . . . . . . . . . . . . . . . . 17

    2.5.1 Bagging . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    2.5.2 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    2.6 Curse of dimensionality . . . . . . . . . . . . . . . . . . . . . . 18

    2.7 Evaluating models . . . . . . . . . . . . . . . . . . . . . . . . 19

    2.7.1 Cross validation . . . . . . . . . . . . . . . . . . . . . . 20

    3 Artificial Neural Networks 21

    3.1 Perceptrons . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    3.2 Training perceptrons . . . . . . . . . . . . . . . . . . . . . . . 22

    3.3 Multilayer perceptron . . . . . . . . . . . . . . . . . . . . . . . 25

    3.4 Activation functions . . . . . . . . . . . . . . . . . . . . . . . 26

    3.4.1 Sigmoid . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    3.4.2 Hyperbolic tangent . . . . . . . . . . . . . . . . . . . . 28

    3.4.3 Rectified Linear Unit . . . . . . . . . . . . . . . . . . . 28

    3.4.4 Which is better? . . . . . . . . . . . . . . . . . . . . . 29

    3.5 Tips and tricks . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    iii

  • 3.6 Backpropagation . . . . . . . . . . . . . . . . . . . . . . . . . 30

    3.7 Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    4 Reinforcement Learning 34

    4.1 The setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    4.2 Rewards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    4.3 Markov Decision Process . . . . . . . . . . . . . . . . . . . . . 39

    4.4 Value functions . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    4.5 Action Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    4.6 Incrementing Q-values . . . . . . . . . . . . . . . . . . . . . . 44

    4.7 Monte Carlo & Dynamic Programming . . . . . . . . . . . . . 45

    4.8 Temporal Difference . . . . . . . . . . . . . . . . . . . . . . . 46

    4.8.1 Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . 47

    4.8.2 SARSA . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    4.9 Eligibility traces . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    4.10 Function approximation . . . . . . . . . . . . . . . . . . . . . 51

    5 Experiments and results 54

    5.1 ALE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    5.2 Space Invaders . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    5.3 Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    5.4 Flow of experiments . . . . . . . . . . . . . . . . . . . . . . . 59

    5.5 Manual features and basic RAM . . . . . . . . . . . . . . . . . 60

    5.6 Difference between bits and bytes . . . . . . . . . . . . . . . . 61

    5.7 Comparing different activation functions . . . . . . . . . . . . 63

    5.8 Initializing Q-values . . . . . . . . . . . . . . . . . . . . . . . . 65

    5.9 Pretraining and extracting other layers . . . . . . . . . . . . . 68

    5.10 Combination of RAM and layer . . . . . . . . . . . . . . . . . 72

    5.11 Visualizing high dimensional data . . . . . . . . . . . . . . . . 73

    6 Conclusions 75

    6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

    Appendices 77

    A Extended graphs and tables 78

    7 Bibliography 82

    iv

  • List of Figures

    1 Architecture of data processing . . . . . . . . . . . . . . . . . 5

    2 Example of a decision tree . . . . . . . . . . . . . . . . . . . . 8

    3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    4 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    5 Data of two features . . . . . . . . . . . . . . . . . . . . . . . 12

    6 k-mean clustering . . . . . . . . . . . . . . . . . . . . . . . . . 13

    7 Unsupervised learning: reduction of dimensions . . . . . . . . 13

    7a MNIST example of the number 2 . . . . . . . . . . . . 13

    7b MNIST reduction of dimensions . . . . . . . . . . . . . 13

    8 Difference between under and overfitting . . . . . . . . . . . . 15

    9 Dartboard analogy from (Sammut & Webb, 2011) . . . . . . . 16

    10 Bias Variance trade-off . . . . . . . . . . . . . . . . . . . . . . 17

    11 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    12 Searching in different dimensions . . . . . . . . . . . . . . . . 19

    12a 1D space . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    12b 2D space . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    12c 3D space . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    13 Example of a perceptron . . . . . . . . . . . . . . . . . . . . . 21

    14 Bitwise operations . . . . . . . . . . . . . . . . . . . . . . . . 23

    14a AND operator . . . . . . . . . . . . . . . . . . . . . . . 23

    14b OR operator . . . . . . . . . . . . . . . . . . . . . . . . 23

    14c XOR operator . . . . . . . . . . . . . . . . . . . . . . . 23

    15 XOR with decision boundaries by learnt MLP . . . . . . . . . 25

    16 Multilayer perceptron . . . . . . . . . . . . . . . . . . . . . . . 26

    17 Other activation functions: linear and step function . . . . . . 27

    18 Sigmoid activation function . . . . . . . . . . . . . . . . . . . 27

    v

  • 19 Hyperbolic tangent activation function . . . . . . . . . . . . . 28

    20 ReLU activation function . . . . . . . . . . . . . . . . . . . . . 29

    21 Example of an autoencoder . . . . . . . . . . . . . . . . . . . 33

    22 A Skinner’s Box from (Skinner, 1938) . . . . . . . . . . . . . . 35

    23 Agent Environment setting . . . . . . . . . . . . . . . . . . . . 36

    24 Another view of the agent environment setting . . . . . . . . . 36

    25 Mountain car; image from (RL-Library, n.d.) . . . . . . . . . . 37

    26 Pole Balancing; image from (Anji, n.d.) . . . . . . . . . . . . . 37