Original article was published on Artificial Intelligence on Medium
Data science may well prove to be the most exciting profession of the 21st century.
My latest book, Machine Learning in Business: An Introduction to the World of Data Science, explains the most popular algorithms used by data scientists. The objective is to enable readers to interact productively with data scientists and understand how data science can be used in a variety of business situations.
In this excerpt from the book, I will present some of the key issues posed to society by AI, which should be on the radar of leaders everywhere. But first, a brief history of our long-standing relationship with machines.
Human vs. Machine: A Brief History
Human progress has been marked by four industrial revolutions:
1. Steam and water power (1760–1840)
2. Electricity and mass production (1840–1920)
3. Computers and digital technology (1950–2000)
4. Artificial intelligence (2000-present)
There can be no doubt that the first three revolutions have brought huge benefits to society. The benefits were not always realized immediately, but they have eventually produced big improvements in our quality of life. At various times there were concerns that jobs traditionally carried out by humans would be moved to machines and that unemployment would result. This did not happen. Some jobs were lost during the first three industrial revolutions, but others were created.
For example, the first industrial revolution led to people leaving rural lifestyles to work in factories; the second changed the nature of the work done in factories with the introduction of assembly lines; and the third has led to more jobs involving the use of computers. The impact of the fourth industrial revolution remains to be seen.
It is worth noting that the third industrial revolution did not require all employees to become computer programmers. But it did require people in many jobs to learn how to use computers and work with software such as Word and Excel. We can expect the fourth industrial revolution to be similar in that many individuals will have to learn new skills related to the use of artificial intelligence.
We are now reaching the stage where machine learning algorithms can make many routine decisions as well as, if not better than, human beings. But the key word here is ‘routine’, because the nature of the decision and the environment must be similar to that in the past. If the decision is non-standard or the environment has changed so that past data is no longer relevant, we cannot expect a machine learning algorithm to make good decisions.
Driverless cars provide an example here. If we changed the rules of the road — perhaps regarding how cars can make right or left turns — it would be very dangerous to rely on a driverless car that had been trained using the old rules.
Going forward, a key task for human beings is likely to be managing large data sets and monitoring machine learning algorithms to ensure that decisions are not made on the basis of inappropriate data. Just as the third industrial revolution did not require everyone to become a computer programmer, the fourth will not require everyone to become a data scientist. However, for many jobs it will be important to understand the language of data science and what data scientists do. Today, many jobs involve using programs developed by others for carrying out various tasks. In the future, they may involve monitoring the operation of machine learning algorithms that have been developed by others.
The fact is, for some time to come, a human plus a trained machine is likely to be more effective than a human or a machine on its own. I will now look at some of the key issues this raises for society — and for organizational leaders.
Issues for Society
Computers have been used to automate business tasks such as record keeping and sending out invoices for many years, and for the most part, society has benefited from this. But it is important to recognize that AI innovations involve more than just the automation of tasks: They actually allow machines to learn. Their aim is to allow machines to make decisions and interact with the environment similarly to the way humans do. Indeed, in many cases, the goal is to train machines so that they improve on the way human beings carry out certain tasks.
Most readers are familiar with the success of Google’s AlphaGo in beating the world champion Go player, Ke Jie. For those who aren’t familiar with it, Go is a very complex game. It has too many moves for a computer to calculate all the possibilities, so AlphaGo uses a deep learning strategy to approximate the way the best human players think about their moves, and then improve on it. The key point is that AlphaGo’s programmers did not teach AlphaGo ‘how to play Go’: They taught it ‘to learn how to play Go’.
Teaching machines to use data to learn and behave intelligently raises a number of difficult issues for society. Following are five particular issues that leaders should familiarize themselves with.
DATA PRIVACY. Issues associated with data privacy received a great deal of publicity as a result of the Cambridge Analytica saga. This company worked for both Donald Trump’s 2016 presidential campaign and for an organization campaigning for the UK to leave the European Union. It managed to acquire and use personal data on millions of Facebook users without obtaining permission from them. The data was detailed enough for the company to create profiles and determine what kind of advertisements or other actions would be most effective in promoting the interests of the organizations that had hired it.
Many governments are concerned about issues concerned with data privacy. The European Union has been particularly proactive and passed the General Data Protection Regulation (GDPR), which came into force in May 2018. It recognizes that data is valuable and includes in its requirements the following:
• A person must provide consent to a company before the company can use the person’s data for other than the purpose for which it was collected.
• If there is a data breach, notifications to everyone affected are mandatory within 72 hours.
• Data must be safely handled across borders.
• Companies must appoint a data protection officer.
Fines for non-compliance with GDPR can be as high as 20 million euros or four per cent of a company’s global revenue. It is likely that other governments will pass similar legislation in the future. Interestingly, it is not just governments that are voicing concerns about the need to regulate the way data is used by companies. Mark Zuckerberg, Facebook’s CEO, agrees that rules are needed to govern the Internet and has expressed support for GDPR.
BIASES. By now, we all know that human beings exhibit biases. Some lead to risk-averse behaviour; others to risk seeking; some make us care about people; others lead us to be insensitive. It might be thought that one advantage of machines is that they take logical decisions and are not subject to biases at all. Unfortunately, this is not the case. Machine learning algorithms exhibit many biases. One of the main ones to pay attention to concerns the data that has been collected: It might not be representative.
A classic example here is an attempt by the Literary Digest to predict the result of the U.S. presidential election in 1936. The magazine polled 10 million people (a huge sample) and received 2.4 million responses. It predicted that Landon (a Republican) would beat Roosevelt (a Democrat) by 57.1 to 42.9 per cent. In fact, Roosevelt won. What went wrong? The answer is that they used a biased sample consisting of Digest readers, telephone users and those with car registrations. It turned out that, taken together, these were predominantly Republican supporters. More recently, we can point to examples where facial recognition software was trained largely on images of white people and therefore did not recognize other races properly, resulting in misidentifications by police forces using the software.
There is a natural tendency of machine learning data to use readily available data and to be biased in favour of existing practices. The data available for making lending decisions in the future is likely to be the data on loans that were actually made in the past. It would be nice to know how the loans that were not made in the past would have worked out, but this data, by its nature, is not available. Amazon experienced a similar bias when developing recruiting software. Its existing recruits were predominantly male and this led to the software being biased against women.
As a result, choosing the features that will be considered in a machine learning exercise is a critical task. In most cases, it is clearly unacceptable to use features such as race, gender or religious affiliation. But data scientists also have to be careful not to include other features that are highly correlated with these sensitive features. For example, if a particular neighbourhood has a high proportion of black residents, using ‘neighbourhood of residence’ as a feature when developing an algorithm for loan decisions may lead to racial biases.