Calculus of Variations Demystified

Original article was published by NAOKI on Artificial Intelligence on Medium


Before finding the shortest path, we need to be able to calculate the length of a path from A to B.

As the famous Chinese proverb say, “a journey of a thousand miles begins with a single step”.

So, let’s calculate the length of a small passage and integrate it from A to B.

Image by author

As shown in the above diagram, a small passage can be approximated by a straight line.

So, the total length of a path can be calculated by the following integration:

Since A and B are fixed points, let’s define them as A = (x1, y1) and B = (x2, y2).

In the last step, y1 and y2 are dropped as they are determined by x1 and x2.

So, we are integrating the square root from x1 until x2.

Now we know how to calculate the length of a path from A to B.

We are going to find the y = f(x) that minimizes the path length S.

What is a functional?

We actually know that the y = f(x) is the straight line between A and B but let’s still forget that for the time being.

Instead, we think of many potential solution lines from A to B.

Image by author

The blue line is y = blue_line(x).

The red line is y = red_line(x).

And so on.

But having a function name for each line is too tedious, so we call them collectively y = y(x).

We think of y = y(x) as any line between A and B which could be the blue line, the red line or else.

We can calculate the length of any line (given a particular y(x)) using the formula S.

In other words, S takes an instance of y(x) and returns the length of the line.

So, S maps a function y(x) to a value.

We call S a functional which is a function of functions.

A functional is written with square brackets like S[y] which means the functional S takes a function y and returns a value.

If we talk about functions of functions like this general, there could be too many kinds of functionals, many of which may not be that useful.

Since the functional optimization originates from the Physics which deals with positions and velocities, the following general form is very common and has many applications (not only in Physics but also in machine learning).

J[y] is a functional that takes a function y.

J integrates L from x1 until x2.

L is a function of x, y, and y'.

y' is the derivative of y in terms of x.

The above may look too abstract.

Let’s apply it to our shortest path problem.

For the shortest path problem, L is defined as follows:

We integrate L between x1 and x2 to get the length of the path.

In the shortest path problem, L does not include x and y but only y'.

That is ok.

The point is that if we can solve the optimization problem for the general form of functionals, we can solve many problems including the shortest path problem.

Functional variations

Before attempting solving the general form of functionals, let’s think about how to solve the shortest path problem.

Do we generate many random functions and evaluate the path length to find out the shortest path?

Image by author

We can map each path to a path length value using the functional S.

The question is how to know if we actually find the shortest path or not.

We can take an alternative approach since we already know that the shortest path exists.

We start with the shortest path and slightly modify it to see what kind of condition is required for the shortest path.

Let’s call the shortest path function f(x) that minimize S.

Here, we write f(x) instead of y(x).

The function f(x) solves our problem.

y(x) is a candidate function that may or may not solve our problem like the blue line and the red line.

When we give y(x) or f(x) to S, we write like S[y] and S[f] respectively.

S[y] gives the path length following y(x) from A to B.

S[f] gives the path length following f(x) from A to B which is the shortest path length.

If we add an arbitrary function η(x)(eta of x) to f(x), the following relationship is asserted:

The term ϵη is called a variation of the function f, which is denoted as follows:

ϵ(epsilon) is a small number so that the variation is also small.

We analyze how the functional changes when we move ϵtowards zero.

To make it more concrete, let’s apply a variation to the shortest path.

In the below diagram, the blue straight line is the shortest path y = f(x).

A purple line is an example of a variation ϵη added to f as in y = f(x) + ϵη(x).

Image by author

S[f + ϵη] gives the length of the path given by the function f + ϵη.

Since S[f] is the shortest path length, the relationship S[f] <= S[f + ϵη] is true for any variation.

When ϵ goes to zero, the purple line becomes the same as the blue line as the variation vanishes.

Also, note that η(x) must be zero at the point A and B since any variation must include the point A and B. (Remember we are looking for the shortest line from A to B).

So, η(x1) = 0 and η(x2) = 0.

Why do we think about the variation?

The reason is that the variation makes the functional optimization into a function optimization which we know how to solve.

If we look at S[f + ϵη] very closely, we can see that it depends only on ϵ.

f(x) is the function that gives the minimum path line which is fixed.

η(x) is an arbitrary function that is fixed for a particular variation.

Therefore, the only variable in S[f + ϵη] is ϵ.

In short, S[f + ϵη] is a function of ϵ.

So, let’s write it as S(ϵ).

S(ϵ) is the function of ϵ that returns the path length for the function y that is the solution function f plus a variation ϵη.

We just need to solve the minimization problem for the function S(ϵ).

Conveniently, we also know that S becomes the minimum when ϵ goes to zero.

So, the derivative of S with ϵ should be zero when ϵ = 0.

Here, the apostrophe () is for the derivative of S by ϵ.

In case it’s not clear, an apostrophe is used when we calculate the derivative of a function by the only independent variable. Since ϵ is the only independent variable of S in this context, the apostrophe here means the derivative of S by ϵ.

Let’s go back to the general form to reiterate all the points we’ve discussed so far and derive the Euler-Lagrange equation to solve the optimization problem.

The Euler-Lagrange equation

The general form of functionals we are dealing with is as follows:

We say the function f minimizes J.

As such, J[f] is the minimum value of the functional J.

We define y = f + ϵη to mean we add a variation to the solution function.

y should still satisfy the boundary condition

As such, η(x1) = 0 and η(x2) = 0.

Since J[f] is the minimum (constant) and η is an arbitrary function of x which is fixed for a particular variation, we can define J[y] as a function of ϵ:

We’ve successfully converted the general form of functionals into a function of ϵ.

This function of ϵ is at minimum when ϵ = 0.

In other words, the first derivative of it becomes zero at ϵ = 0.

So far, we reinforced the same idea from the shortest path problem for the general form of functionals.

Now, we are going to solve it for the general form of functionals to derive the Euler-Lagrange equation.

By substituting J[y] into the above equation:

Let’s calculate the total derivative of L(x, y, y') by ϵ.

The first term is zero since x has no dependency on ϵ.

y = f + ϵη and y' = f' + ϵη'.

f, f', η and η' have no dependency on ϵ.

So, the total derivative of L by ϵ is as follows:

When ϵ=0, y = f and y' = f':

Substituting this into the equation ∅'(0)=0.

In the second term, we have η’.

As η is an arbitrary function, we have no way to calculate η’.

Let’s eliminate η' by applying integration by parts on the second term.

The first term is zero since η(x1) = 0 and η(x2) = 0.

Therefore,

As η(x) is an arbitrary function, the term in the brackets must be zero to satisfy the equation ∅'(0)=0.

This is called the Euler-Lagrange equation which is the condition that must be satisfied for the optimal function f of the general form of functionals.

Solving the shortest path problem

Let’s solve the shortest path problem using the Euler-Lagrange equation.

As mentioned before, y' is the derivative of y in terms of x.

Now, we say y = f(x) is the function that minimizes the path length from A to B.

So, we think of f instead of y in L.

This f must satisfy the Euler-Lagrange equation.

Let’s solve the Euler-Lagrange equation for the shortest path problem.

Since f does not appear in L, the first term is zero.

As such, the second term is also zero.

We integrate the above by x:

C is a constant value.

This means f' is also a constant value.

It can be shown by taking the square of both sides and re-arrange the equation:

Let f' = a and we get f(x) = ax + b which is a straight line.

Voilà!

If we put the boundary conditions f(x1) = y1 and f(x2) = y2, we get the values for a and b.

The last not the least…

The essential idea of the calculus of variations is to make a functional into a function of ϵ by adding the variation ϵη to the optimal function f so that the problem of functional optimization becomes a function optimization problem.

We derived the Euler-Lagrange equation for the general form of functionals which must be satisfied for solution functions.

Mathematically speaking, the Euler-Lagrange equation is not sufficient condition for the functional minimum or maximum. It just tells us the solution function makes the functional stationary.

However, in many problems, we know the solution would give the functional minimum or maximum.

In the shortest path problem, the maximum has no limit, so we know the solution to the Euler-Lagrange equation gives the minimum function.

In case we want to check if the solution function is for the maximum or minimum, we need to calculate the second derivative of ∅(ϵ) by ϵ for ϵ=0 where y = f + ϵη.

Let’s do this for the shortest path problem:

The first derivative of S(ϵ) with ϵ:

The second derivative of S(ϵ) with ϵ:

This is already a positive value, so we know the Euler-Lagrange equation for the shortest path problem gives the minimum function.

In theory, we should check the value of the second derivative when ϵ=0 because we are talking about the local minimum/maximum around the range expressed by ϵ.

When ϵ=0, y = f:

All in all, the solution function (the straight line) given by the Euler-Lagrange equation for the shortest path problem gives the minimum of the functional S.

Lastly, I assumed all the derivatives are possible in this article. For example, y(x) and η(x) must be differentiable by x.

I hope you have a clearer idea about the calculus of variations now.

References

Calculus of variations (Wikipedia)

The Calculus of Variations (Bounded Rationality)

Introduction to Calculus of Variations (Faculty of Khan)

https://youtu.be/6HeQc7CSkZs