ML Paper Challenge Day 36, 37— Building high-level features using large scale unsupervised learning

Original article was published on Deep Learning on Medium

ML Paper Challenge Day 36, 37— Building high-level features using large scale unsupervised learning

Day 36–37: 2020.05.17–18
Paper: Building high-level features using large scale unsupervised learning
Category: Model/Unsupervised Learning

This paper is a milestone. The main topic there is to prove that it is possible to learn some high-level features without any labelled data.

Result First:

  • A neuron is learnt to classify face image with 81.7% accuracy.
  • A neuron is learnt to classify cat and human body image with 74.8% and 76.7% respectively.
  • Control experiments show that the learned detector is not only invariant to translation but also to out-of-plane rotation and scaling.

How

  • An extremely huge number of trainable parameters 1 billion, more than an order of magnitude larger than other large networks reported in literature.
    (although still tiny compared to the human visual cortex, which is 106 times larger in terms of the number of neurons and synapses)
  • A 9-layered locally connected sparse autoencoder with local receptive fields (each feature in the autoencoder can connect only to a small region of the lower layer), local L2 pooling (to achieve invariance to local deformations) and local contrast normalization (allows the learning of invariant features)
  • Train using model parallelism and asynchronous SGD on a cluster with 1,000 machines (16,000 cores) for three days.
  • 1st sublayer: filtering (or simple) ;
    has receptive fields of 18×18 pixels and the second sub-layer pools over 5×5 overlapping neighborhoods of features (i.e., pooling size).The neurons in the first sublayer connect to pixels in all input channels (or maps) whereas the neurons in the second sublayer connect to pixels of only one channel.
  • 2nd sublayer: L2 pooling (or complex)
  • 3rd sublayer: local subtractive and divisive normalization (inspired by biological and computational models)
  • local receptive fields != convolutional: the parameters are not shared across different locations in the image.