Original article can be found here (source): Artificial Intelligence on Medium

*“Everyone probably knows what a conditioned reflex is. If two any stimuli are repeatedly carried out simultaneously with each other (for example, a bell rings at the same time as a meal), then after a while one of these stimuli (a bell) triggers a response from the body (salivation) to another stimulus (a meal). This adhesion is temporary and, if not reinforced, gradually disappears. A significant part of the cybernetic problems, which are now known as the mathematical theory of learning, covers such very simple schemes that do not exhaust even a small fraction of all the complex higher nervous activity of a person and in the analysis of the conditioned reflex activity itself represent only its initial stage. ”*

**Andrey Kolmogorov**, the man who invented the modern probability theory, Automata and Life, 1961

I am still tormented by the problem of how to simply and clearly explain the difference between the scientific understanding method and the method of reinforcement learning.

What does it mean to start with an observation, not a prior belief?

There is a reasoning called Bayesian inference because Bayes’ theorem is used in it to show how our knowledge is changing. This inference can be imagined as an electronic device in which there is an input, an output and a filter that modulates the signal between the input and the output.

In a classic Bayesian inference, we have at the input our existing knowledge, and at the output, we get our knowledge already updated as a result of an observation. A filter is, accordingly, the data obtained as a result of an observation.

If an observation confirms our old knowledge, then updating does not occur. If it does not confirm, then a partial update occurs. If a series of observations does not confirm the old knowledge, then it is updated radically.

This way we learn gradually by the method of multiple repetitions of the same action. For example, we toss a coin and observe how it fell: heads or tails up.

Initial probability is 50/50. But, if the coin gets suddenly bent, then the probability could become 60/40. We will notice this, but not immediately, only after a series of tossing, sufficient to determine the new probability of falling tails or heads up.

Now imagine that we swapped the wires on the input and filter lines. Now observation is the input, and our existing knowledge is the filter.

Now we look at the coin and do not know with what probability it will fall the tails or the heads up. We do not have statistics on previous throws, but we know from past experience that flat objects tend to fall on one of their flat sides. If the object rotates, then the likelihood that it will fall with one or the other flat side up is approximately equal.

That is, we derive the probability from examining the coin and comparing it with other similar objects in similar conditions. We don’t need to toss a coin at all in order to understand that its falling with either side up is approximately equal.

We do not need to calculate the probability accurately, build graphs from a series of throws. We just realized it right away by examining the coin and comparing it with other similar objects we had the experience with before. This is the scientific method of cognition.

If the coin gets bent so that it becomes noticeable, we can check what the probability of the heads or tails has become, tossing the coin several times.

If the looks of the coin has not changed, but the tails begin to fall out more often than the heads, we will look for the reason for the change and will not calm down until we find it.

All kinds of computer modeling are very common in modern science. But in it, as well as in model based machine learning, the big question arises: where do we get the initial parameters of the model?

As a rule, it is customary to set them from top to bottom — that is, to hardwire the knowledge that is at the input, and then to pass it through the observation filter and to adjust accordingly.

And you just need to swap the input and filter wires in places to enable the model itself to find (or create?) the initial parameters.