The rise of Big Data as the core lubricant of the digital world

Original article was published by Alessandro Prosperi on Deep Learning on Medium


The rise of Big Data as the core lubricant of the digital world

(Image Credit: geospatialworld.net)

“Information is the oil of the 21st century, and analytics is the combustion engine”.

— Peter Sondergaard, Senior Vice President, Gartner Research.

Big Data from Space and the “overview effect”

“I used to tell people I was from Cleveland, Ohio, because that was where I was born. Today, I simply say I am from Earth.”

That’s the world view of former Nasa astronaut Don Thomas who has orbited Earth 692 times. Known as the “Overview Effect”, many astronauts come back from space with a completely different fundamental view of our planet. A new perspective in the space sector has been possible thanks to many analysis tools that offer visualization of data, and have proven to be beneficial, since they make us understand our planet Earth better and unravel the mysteries of the universe.

Big data technology is the product of information technology which aims to meet the challenges faced by increasing amount of information in various fields If we think of all the times we’ve used our phone or computer, how many apps have we logged in to? Have we checked Facebook, Twitter, Instagram, Reddit, or LinkedIn? Do we regularly use Amazon, YouTube, Tinder, Buzzfeed, or Pinterest? Every one of those app stores or websites that many of us use on a daily basis collect user data to improve user experience and help companies to make educated business decisions. But that’s not all data can help us do.

Right now, satellites are performing 2 billion instructions per second and delivering data that could help us prevent natural disasters and use natural resources wisely. There have been several data-driven initiatives to make better decisions and improve operational efficiency in sectors including agriculture, forestry, mapping, shipping, or energy.

Using Data to improve life on Earth

More and more companies are starting to open to the space sector as the ever-growing number of affordable satellite services keeps increasing. Considering one industry — agriculture — the implications are enormous. Farmers can use image data to better understand what factors affect the growth of crops, and there are factors that can be detected from space, such as weather patterns, exposure to sunlight, air quality or pest activity, so optimum conditions can be determined.

In a few short decades, the world’s population is on pace to grow 50 percent by 2100. Now more than ever, farmers need access to tools that support the decisions they make every day to maximize their return on every acre. The Climate Corporation processes its satellite data to enable farmers to find more sustainable ways to grow more food. This company’s project’s key aspects can deliver benefits to humanity in the long term.

Another company, Planet, provides geospatial insights equipping users with the data necessary to make informed, timely decisions offering a diverse selection of imagery and analytic solutions, all made available online through their platform and web-based tools. From agriculture and emergency response to natural resource protection and security, global imagery and foundational analytics will empower informed, deliberate, and meaningful stewardship of our planet.

Earth observation satellites provide important data that allows the rapid detection of changes to the environment and climate, or measurements of the movement or shrinking of glaciers. Up-to-date maps can be provided to the emergency services in the event of disasters such as flooding or earthquakes. This, however, requires the accumulation of very large quantities of data. The European Union (EU) Copernicus Program satellites are among the biggest producers of data in the world. Their high-resolution instruments currently generate approximately 20 terabytes of data every day. This is equivalent to an HD film that would run for about one-and-a-half years. In addition to this, data is also provided by German missions such as TerraSAR-X and TanDEM-X, as well as an increasing number of other sources, such as the internet and measurement stations. The processing and analysis of these very large and heterogeneous data sets are among the Big Data challenges facing an increasingly digital society.

Sul­phur diox­ide map — vol­canic erup­tion on Bali (Credit: Copernicus-Sentinel (2017), DLR/ESA)

New ideas and concepts are needed in order to be able to process data and turn it into information. Artificial intelligence plays a major role in this, as such processes are extremely powerful, especially where large amounts of data are involved. DLR scientist Xiaoxiang Zhu, based at the Technical University of Munich, is conducting research into the use of such methods. Together with her team, Zhu is developing exploratory algorithms from signal processing and artificial intelligence (AI), particularly machine learning, to significantly improve the acquisition of global geoinformation from satellite data and achieve breakthroughs in geosciences and environmental sciences. Novel data science algorithms allow scientists to go one step further with the merging of petabytes of data from complementary geo-relevant sources, ranging from Earth observation satellites to social media networks. Their findings have the potential to address previously insoluble challenges, such as recording and mapping global urbanization — one of the most important megatrends in global change.

Yet the field of satellite remote sensing is not alone in grappling with this challenge. Investigating phenomena, the other way round — looking from Earth into space — also generates enormous amounts of data. Telescopes such as the Square Kilometre Array (SKA) in South Africa and Australia provide large quantities of data, as do ESA’s space-based telescopes, for example, Gaia and Euclid. The systematic analysis of archive data by self-learning AI programs is thus becoming increasingly important in astronomical research.

The Square Kilometre Array: The world’s biggest telescope (Photo Credit: visual.ly)

“We’ve been talking about Big Data for a long time, and this takes us on the journey to start understanding space data and space analytics. Not too many people in the commercial sector have got their hands around it yet, they don’t fully understand the implications of all of this data” said Sparks & Honey CEO Terry Young. “The idea was to look at the innovations that are going to be created over the next 15 years on our journey to Mars and beyond, and to find from those innovations — which are very science or engineering-focused — what the implications are for organizations and consumers, back here on Earth”.

In the past, space data applications have been mainly carried out by Governments because of the sky-high cost of launching satellites and keeping them in space, where they could generate data with cameras, sensors and scanners, or used to monitor conflicts, track the flow of refugees and gather terrestrial and space data for research purposes. Thanks to the likes of SpaceX, founded by Tesla entrepreneur Elon Musk, as well as hundreds of startups, billions will be spent in the coming decade on creating infrastructure. The exciting part for the industry is that much of this data will become available for organizations whose business is not primarily space-based.

“Something which is hovering above the Earth and providing a perspective from above is really creating a unique dataset. Roughly 35% of the satellites in orbit right now are there for commercial purposes, and those satellites have been driven by venture capital money. A lot of startups are providing low-orbit satellites for a wide range of different uses”. “We covered ideas like being able to observe things like water shortage, as it relates to manufacturing processes, traffic patterns in large cities as we are looking towards building cities of the future and their infrastructure. We can even translate it to big retail, where all of a sudden, we can capture real-time data on hundreds of stores simultaneously and use it to look at foot traffic patterns,” Young said.

Why Space Data is the new Big Data

Data analytics can be used to improve sports performance, to help us better understand and build cures for disease, to aid in the development of artificial intelligence, to improve infrastructure in your city, and to expand the reach of what science can do. NASA has recently used data gathered over years of exploration to launch an amazing interactive map of Mars. Called, “Mars Trek,” the map is an educational tool NASA has available to the public as part of their Mars Exploration Program. Here’s the link: https://trek.nasa.gov/mars/

According to NASA’s official Mars Trek site, “This portal showcases data collected by NASA at various landing sites. It features an easy-to-use browsing tool which provides layering and viewing of high-resolution Mars data products in 2D and Globe view allowing users to fly over the surface of Mars. It also provides a set of tools including 3D printing, elevation profiles, sun angle calculations, Sun and Earth position, as well as bookmarks for the exploration area by NASA missions”. These “missions” which have supplied the majority of the data for the map to date are specifically the MSL (Mars Science Laboratory) mission, which involved the Curiosity Rover, the MER (Mars Exploration Rovers) mission, which included Spirit and Opportunity, the Phoenix mission, and the Pathfinder mission. NASA plans to continue to update the map as new data becomes available.

A Sample Analysis at Mars (SAM) team member at NASA’s Goddard Space Flight Center. (Image Courtesy: NASA/JPL-Caltech)

This is especially exciting as the Mars 2020 rover should be bringing us a whole new supply of data to add to the map by 2021. Modeled after the Curiosity, which has been a breakthrough unmanned system for NASA, the 2020 rover which launched on 30 July 2020 at 11:50 UTC will explore the habitability of Mars, hopefully paving the way for NASA’s manned missions tentatively planned for 2030.

Data, more data, and petabytes of data

Even in the healthcare sector data are largely mentioned. Pathologists have been diagnosing disease the same way for the past 100 years, by manually reviewing images under a microscope. Now, computers help doctors improve accuracy and significantly change the way cancer and other diseases are diagnosed.

Artificial intelligence (AI) methods have been developed by a research team from Harvard Medical School and Beth Israel Deaconess Medical Center that aimed at training computers to interpret pathology images, with the long-term goal of building AI-powered systems to make pathologic diagnoses more accurate.

“Our AI method is based on deep learning, a machine-learning algorithm used for a range of applications including speech recognition and image recognition,” explained pathologist Andrew Beck, HMS associate professor of pathology and director of bioinformatics at the Cancer Research Institute at Beth Israel Deaconess. “This approach teaches machines to interpret the complex patterns and structure observed in real-life data by building multi-layer artificial neural networks, in a process which is thought to show similarities with the learning process that occurs in layers of neurons in the brain’s neocortex, the region where thinking occurs”.

“Identifying the presence or absence of metastatic cancer in a patient’s lymph nodes is a routine and critically important task for pathologists,” Beck explained. “Peering into the microscope to sift through millions of normal cells to identify just a few malignant cells can prove extremely laborious using conventional methods. We thought this was a task that the computer could be quite good at — and that proved to be the case”. In an objective evaluation in which researchers were given slides of lymph node cells and asked to determine whether they contained cancer, the team’s automated diagnostic method proved accurate approximately 92 percent of the time, said Khosla, adding, “This nearly matched the success rate of a human pathologist, whose results were 96 percent accurate.”

“But the truly exciting thing was when we combined the pathologist’s analysis with our automated computational diagnostic method, the result improved to 99.5 percent accuracy,” said Beck. “Combining these two methods yielded a major reduction in errors”.

The team trained the computer to distinguish between cancerous tumor regions and normal regions based on a deep, multilayer convolutional network. To accomplish this, researchers had to amass huge amounts of data from which they could train their machine learning models.

Fig. 1 The framework of cancer metastases detection
Fig. 2 Evaluation of various deep models (Fig.1 and 2 Credits: Wang D, Khosla A, Gargeya R, Irshad H, Beck AH. Deep Learning for Identifying Metastatic Breast Cancer [Internet] arXiv 2016)

And it isn’t just radiology. The emerging field of gene therapy maps pathologies to specific genetic mutations. This means that newly diagnosed cancer patients now routinely have their genes sequenced so oncologists can prescribe the most effective treatment.

The key to both of these life-saving advances? Petabytes and petabytes of data.

What the future holds and the global effort for open access to data

Back in 2016, Piero Scaruffi, cognitive scientist and author of “History of Silicon Valley” said: “The difference between oil and data is that the product of oil does not generate more oil (unfortunately), whereas the product of data (self-driving cars, drones, wearables) will generate more data (where do you normally drive, how fast/well you drive, who is with you)”.

Google trends for “data is the new oil” until 2020

Open data, big data and technology revolutions are stimulating for businesses, governments, and citizens.

Today, the industry is witnessing a wide variety of downsized technologies — miniaturization of sensors and satellites; a high number of private entrepreneurial missions, and adoption of new technologies such as AR/VR, artificial intelligence and machine learning, cloud, etc. How do we make all this data accessible for everyone? By making it open. Providing better environmental satellite data sharing policies and making practical recommendations for increasing global data sharing.

Open.NASA, for example, is an open innovation program in NASA’s Innovation Division, which creates many open data programs for both space professionals and enthusiasts. The NASA Space Apps Challenge Hackathon, NASA Datanauts, and the Data Bootcamp are projects which provide opportunities for citizens to easily get access and innovate with NASA’s open data, code, and APIs. All of this and much more is becoming plausible with an increase in space investments. More private sector companies — large, medium and small — are entering the earth observation foray redefining the very meaning of the what the future holds.

Autonomous vehicles (AVs) are also coming too. The benefits are widely known: safer roads, a boost to the economy and less rush-hour crowding. But perhaps the biggest benefit is a reduction in greenhouse gases (GHG) coming from automobiles. Research conducted by Poznan University professors estimates that autonomous vehicles could eventually reduce GHG by 40% to 60%. In this case, it requires hundreds of petabytes of data that form the data lake from which the AV self-driving advanced machine learning solutions will come. It doesn’t stop there. Each of these modern “computing platforms that happen to be mobile” will generate terabytes of data per week per vehicle. Even assuming a 75% reduction in the number of vehicles on the roads, that’s many exabytes of data per year. If a vehicle accident occurs, you can call up the images that the vehicles involved recorded to decide what caused the accident and which AV algorithms need improvements.

We are on the cusp of exploring an unprecedented abundance of innovation, research, resources and technological connection. All with Earth-bound resonance. Space isn’t just a moonshot. It’s transforming life, not just in orbit, but here on Earth.

And data isn’t just shaping the way our businesses run, it is shaping our lives.