Original article was published by Danyal Jamil on Artificial Intelligence on Medium

Stuck behind the paywall? Click here to read the full story with my friend link!

We all, ML Engineers or Data Scientists, love ** Data**. Whenever we hear that we are getting more

**to use, it sounds like Heaven but**

*Data*Not everything is as it seems.

“What is the drawback here?” ~ you might ask. So, we have our little CPU Processors, but some of the lucky ones among us have GPUs, but then too, the Computing Power is not skyrocketing and has a limit. The main drawback that I can think of is ** Too long Training time** obviously.

# Exponentially Weighted Averages

Let me explain this the way NG explained. Suppose you have the weather data of London. It might look something like:

`╔═══════════╦════════════════════╗`

║ house ║ Weather ║

╠═══════════╬════════════════════╣

║ X1 ║ 40 ˚F ║

║ X2 ║ 40 ˚F ║

║ X3 ║ 49 ˚F ║

║ .. ║ ... ║

║ .. ║ ... ║

║ .. ║ ... ║

║ X180 ║ 63 ˚F ║

║ X180 ║ 61 ˚F ║

╚═══════════╩════════════════════╝

Where ‘X1’ is the whether on Day 1 and ‘*X180*’ is for the *180th* Day. Now, if you plot the data, you’ll see something like this:

# Now, what the algorithm is?

The formula is: **Vt = (ß) * Vt-1 + (1 — ß) * øt**

Where:

- ‘øt’ is the current day’s temperature.
- ‘Vt-1’ is the day before’s temperature.
- And ‘ß’ is a variable. It changes how the graph looks like.

The more the value of ß, the more the smooth curve would be. If ß is lesser, the curve will be noisy.

Here, the lines are:

: 0.98*Green*: 0.9*Red*: 0.5*Yellow*

Or If :

- ß ≈≈ 1:
*Green*Line. - ß != 1 or 0:
*Red*Line. - ß ≈≈ 0:
*Yellow*Line.

Where ‘≈≈’ means ‘*Approaches to*’.

What we have is an *exponentially* *decreasing* value. All of *Vt* add unto 1, roughly. Hence, we can say that *V100* will be ß* *times the ** sum of all the values** of V before the 100th day.

# Implementing the algorithm

# Pseudo CodeVø = 0For loop {

Get next øt

Vø = ß * Vø + (1 - ß) * øt

}

That’s it! Yes, the *biggest* ** pro** of using this algorithm is that it uses

*very little memory*. We just initialize

*Vø*and then we keep on

**it.**

*updating*If you want to compute averages of many values, this is useful due to its so less space acquired.

# What is Bias Correction?

*Greenline *is what we want but the *purple line* is what we obtain using the equation.

What happens is that because we *initialize* *Vø *to *zero*, in a couple of first terms and hence, the graph starts ** pretty low** and is

**not**what we expect.

So, in order to *cope* with this, instead of using *Vø*, we use **Vt/(1 — ßt)**, and this pretty much *solves* the problem. Also, when *t* is *large enough*, there is almost no correction needed and hence is what the equation shows.