Extreme Learning Machines

Original article was published on Artificial Intelligence on Medium


Extreme Learning Machines

Part III: Is it better?

Well, it depends.

Like said, it has the main advantage of the smallest training time and error and better generalization performance. ELM has the simplest algorithm as we do not have to decide the number of hidden layers, learning rate, and other hyperparameters. Even being simpler, ELM outperforms any other algorithm in terms of accuracy, precision, and recall.

But ELM architectures mostly end up with more number of hidden nodes for the first layer which affects the test time. If the application of the model does not demand lower training time but requires faster results as its priority, then ELM should not be your first choice. For example, ELM hasn’t turned out good in real-time image classification.

Comparison from source [3]

Comparison with LSTM and HTM

OR-ELM used here is a type of ELM algorithm for online recurrent time-series data. This experiement was done on predicting faults in an cloud environment.

Parameter used here is NRMSE, which is Normalized Root Mean Squared Error. In fig.1.(a), we can see that ELM algorithm has overall better performance has it has lower NRMSE.

Fig.1 (a) Prediction error for the 40 days from source[1]
Fig.1 (b) Prediction error when rapid changes of inputs occurred from source[1]

Time comparison: Time taken by ELM algorithm is around less than 10% of its comparison, along with having overall lesser error.

Fig.2 Comparison with (I)NRMSE, (II)MAPE, (III)Computational Time (in sec) from source[1]

Comparison with Support Vector Machine and Random Forest

Dataset is randomized and divided into three parts: full samples, half samples, and 1/4th samples. The full dataset consists of 65,535 samples, the half dataset includes 32,767 samples, and the 1/4th dataset consists of 18,383 samples. Accuracy, precision, and recall are used as evaluation metrics.

Dataset is then split into 80% training and 20% testing.

ELM performs better compared with SVM (Linear), SVM (RBF), and RF on full data samples, whereas SVM (RBF) indicates improved accuracy over RF and ELM on half data samples. SVM (Linear) outperforms other techniques on 1/4 data samples.

Fig. 3 (a) Accuracy of SVM, RF and ELM

The precision of ELM is better than that of SVM Linear and RBF on the full data samples, and it also outperforms that of RF. On half data samples, the precision of SVM (Linear) is higher than that of SVM (RBF), ELM, and RF. On 1/4th data samples, the precision of SVM (Linear) is equal to that of SVM (RBF). Furthermore, the SVM performs better than ELM and RF in the 1/4 dataset.

Fig. 3 (b)

The recall of SVM (Linear), SVM (RBF), RF, and ELM on 20% testing and 80% training data samples is shown in Figure 7. On full data samples, the recall of ELM performs better than those of SVM (Linear), SVM (RBF), and RF. The recall of SVM (Linear) is greater than those of SVM (RBF), ELM, and RF. The ranking of recall on 1/4 of data samples is as follows: first for SVM (RBF), second for SVM (Linear), third for RF, and fourth for ELM. The abovementioned discussion indicates that SVM performs better on a small dataset, whereas EML outperforms others approaches on large datasets.

Fig. 3 (c )

Conclusion

We can conclude that ELM has better performance than any other algorithm given large amount of data and shortens the training time from days (spent by deep learning) to several minutes (by ELM) in MNIST OCR dataset, traffic sign recognition and 3D graphic application, etc.

Along with that ELM even performs better when there are rapid changes in input data as seen in fig.1.

The only drawback for using ELM algorithm will be a little bit of larger testing time in some cases and the dataset should be large enough.