# Infinite Steps CartPole Problem With Variable Reward

## Modify Step Method of CartPole OpenAI Gym Environment Using Inheritance

In the last blog post, we wrote our first reinforcement learning application — CartPole problem. We used Deep -Q-Network to train the algorithm. As we can see in the blog, the fixed reward of +1 was used for all the stable states and when the CartPole loses its balance, a reward of 0 was given. We saw at the end: when the CartPole approaches 200 steps, it tends to lose balance. We ended the blog suggesting a remark: the maximum number of steps (which we defined 200) and the fixed reward may have led to such behavior. Today, let’s not limit the number of steps and modify the reward and see how the CartPole behaves.

# CartPole Problem Definition

The CartPole problem is considered to be solved when the average reward is greater than or equal to 195.0 over 100 consecutive trials. This is considering the fixed reward of 1.0. Thanks to its definition, it makes sense to keep a fixed reward of 1.0 for every balance state and limit the maximum number of steps to 200. It delights to know that the problem was solved in the previous blog.

The CartPole problem has the following conditions for episode termination:

1. Pole angle is more than 12 degrees.
2. Cart position is more than 2.4 — center of the cart reaches the edge of the display.

# Variable Reward

Our goal here is to remove the number of steps limitation and give a variable reward to each state.

If x and θ represents cart position and pole angle respectively, we define the reward as:

`reward = (1 - (x ** 2) / 11.52 - (θ ** 2) / 288)`

Here, both the cart position and pole angle components are normalized to [0, 1] interval to give equal weightage to them. Let’s see the screenshot of the 2D view of the 3D graph.