Given betas (B=0.1 and C=0.9): B=0.1 B²=0.01 B³=0.001

Source: Deep Learning on Medium

Given betas (B=0.1 and C=0.9):
B=0.1
B²=0.01
B³=0.001

C=0.9
C²=0.81
C³=0.729
C³⁹=0.016
C⁴⁰=0.014

Considering threshold T=0.015, it is noticed that for B² < T and C⁴⁰<T.
So, for bigger betas, the higher the power has to be in order to be below the threshold. In other words, if we set beta to higher values, the more values (samples) we have to consider until we start “forgetting” them.