The Too-much-good-Information Paradox

Source: Deep Learning on Medium

To understand what is going on today in technology, we should have a look back in human history, since the appearance of first hominids and those morphology changes that habilitate us to communicate efficiently between us. This can be surprising, but in an exponential run like the information processing history, the starting point is very important.

At the beginning, three hundred thousands years ago, knowledge was transferred from individual to individual, based on positive or negative experiences; this way, others could obtain benefits like avoid danger or get more food. In those precise moments, we started to take a competitive advantage in the use of information, and at that time, surged the necessity to “record” the knowledge for following generations in some paintings in the darkness of an European cavern only thirty thousands years ago.

It took some time for humanity to take the next leap from drawings on a wall showing animals and how to hunt them, to a basic written language meant to transport the human experience, like a sumerian poem 1800 years BC: “Because the messenger’s mouth was heavy and he couldn’t repeat (the message), the Lord of Kulaba pats some clay and puts words on it, like a tablet. Until then, there had been no putting words on clay”. The cuneiform writing system was in use for more than three millennia, through several stages of development, from the 31st century BC down to the second century AD. Ultimately, it was completely replaced by alphabetic writing (in the general sense) in the course of the Roman era when latin was the language of the republic and lately the empire.

In the middle ages, they realized the importance of information, this hand written latin knowledge in incunable books, should be copied, one by one, by monks in monasteries with art and dedication, like a treasury. Thanks to those copied books, and some of them that have arrived intact to these days, we know that those humans were the same as us in thinking, feelings and curiosity.

1500 AD marks the arrive of the printer press from Guttenberg, and the use of common languages in books. Knowledge starts its spread among general population, and more scientists, engineers and thinkers, improved our life using words written thousands of years before as a common base of knowledge.

The volume of written information increases, in every country books are printed in common language, population alphabetizes, some of them are illustrated, and they reveal themselves agains the powerful. Information changes the face of the world, and the world itself. Any book could be written, printed and lots of people can read it by the eighteenth century.

Not very long ago, in the nineties, internet arrived to western homes, knowledge was no longer protected by the heavy walls of a medieval monastery library, not even in a book store waiting to be bought, you can read, learn, be trained,… everything “on-line”, as we are hyperconnected we can find the answer to any kind of doubt, inmediately, from our mobile devices, and trillions of bytes of information are generated by humans and connected devices every hour and stored somewhere in a “cloud”.

As we saw, information was increasing at first every hundred thousand years, then tenths of thousands, then thousands, then centuries, then decades, exponentially every single year or even month. There is no possible way that, physically, any human being could assimilate that much information, we don’t have enough time to even read information related to our profession, it is close to impossible for a doctor to take into account all the patient related past information to treat him or her correctly, the specialists have access to that information, yes, but it is… too much of it.

Since the arrival of computers in 1945, the capacity of processing information has been increasing also, exponentially. Every year, computers double their CPU power and memory at the same price, following what is called the Moore’s Law: “the number of transistors in a dense integrated circuit doubles about every two years”.

Not long ago, the owners of the biggest farms of computers realized that the unused capacity of those computers could be used to process those increasingly large amounts of information, bigdata arrived and huge amounts of very fast changing data started to be processed in parallel by cheaper computers, reviewing numbers and structured information from databases.

But, knowledge is not only numbers and structured information, knowledge still lays in books and documents, in blogs and news web sites, even in social networks. Knowledge can only be processed by humans and humans, as we stated earlier, they are limited to their capacity to read in the time they have. This is the too-much-information paradox, we have all that knowledge, laying there, at reach, and somebody can only take advantage of a minimal part of it.

Imagine that there is a a doctor that can make the best possible decision for a patient’s wellbeing based not only in his medical degree and clinical experience, but on all the literature available for all the patient diseases, the hundred daily articles published related to it, the opinions of hundreds of thousand of doctors with patients like this and the history and evolution of millions of patients with the same disease around the globe. Maybe it is too much imagination, but, the information is there, at reach, so, what do we need to do to use it for its best purpose?

Today, data scientists like me, work hard to make computers actually “read” and scale human capacity to make decisions far beyond the word of mouth, the treasured incunable, the specialized book, the expedient file. We work to use all that possible available information to make a better decision, maybe a better treatment. We work to break the too-much-information paradox using Natural Language Processing technologies to get insights from huge amounts of written information and Deep Neural Networks to identify patterns in it and avoid to be misled by fake or irrelevant data.

From the prehistoric cave walls we have managed to transfer information to next generations to make a better use of it. Now it is time to evolve and synthesize it by using the means we have to improve our lives again in a man-machine-man knowledge transfer.