Principle Components & Neural Networks How I finished second in Mapping Dark Matter Challenge Sergey Yurgenson, Harvard University Pasadena, 2011
Dec 14, 2015
Principle Components & Neural Networks
How I finished second in Mapping Dark Matter Challenge
Sergey Yurgenson, Harvard UniversityPasadena, 2011
e1=-0.13889e2=0.090147
Training set 40,000 training examples
Test set 60,000 examples
e1= ?e2= ?
g: P -> e•Regression function g does not need to be justified in any scientific way!
•Supervised learning is used to find g
Data mining view
P Pe e
Neural Network
=> =>e1=-0.13889e2=0.090147
RMSE=0.01779
Too many inputs parameters.Many parameters are nothing more than noise.Slow trainingResult is not very good
Reduce number of parametersMake parameters “more meaningful”
Matlab
Principle components to reduce number of input parameters
Neural Network with PC as inputs : RMSE~0.0155
Implicit use of additional information about data set:2D matrixes are images of objectsObjects have meaningful center.
Calculate center of mass with threshold.Center pictures using spline interpolation.Recalculate principle componentsFine dune center position using amplitude of antisymmetrical components
Original Centered
Components # 2 and # 3
Linear regression using only components 2,3 => RMSE~0.02
Color – 2theta
Color – (a-b)/(a+b)
e1=[(a-b)/(a+b)]cos(2theta) e2=[(a-b)/(a+b)]sin(2theta)
•Neural Network:38 (galaxies PC) + 8 (stars PC) inputs2 Hidden Layers -12 neurons (linear transfer function) and 8 neurons(sigmoid transfer function)2 outputs – e1 and e2 as targets80% random training subset, 20% validation subset
•Multiple trainings with numerous networks achieving training RMSE<0.015
•Typical test RMSE =0.01517 – 0.0152
•Small score improvement by combining prediction of many networks (simple mean): Combination of multiple networks, training RMSE ~0.0149
public RMSE ~0.01505-0.01509private RMSE ~0.01512-0.01516
Benefit of network combination is ~0.00007-0.0001
•Best submission – mean of 35 NN predictions
Training RMSE Test RMSE
Original 0.01499 0.01518
7 bit resolution 0.01503 0.01522
6 bit resolution 0.01513 0.01532
5 bit resolution 0.01551 0.01574
4 bit resolution 0.01696 0.01718
Pix size 2 0.01546 0.01571
+ 0.5 noise 0.01684 0.01706
+1.0 noise 0.02120 0.02152
+1.5 noise 0.02873 0.02916