Source: Deep Learning on Medium
Given betas (B=0.1 and C=0.9):
Considering threshold T=0.015, it is noticed that for B² < T and C⁴⁰<T.
So, for bigger betas, the higher the power has to be in order to be below the threshold. In other words, if we set beta to higher values, the more values (samples) we have to consider until we start “forgetting” them.